public inbox for gdb-patches@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/5] Add Python API for the disassembler
@ 2021-10-13 21:59 Andrew Burgess
  2021-10-13 21:59 ` [PATCH 1/5] gdb: make disassembler fprintf callback a static member function Andrew Burgess
                   ` (5 more replies)
  0 siblings, 6 replies; 80+ messages in thread
From: Andrew Burgess @ 2021-10-13 21:59 UTC (permalink / raw)
  To: gdb-patches

I had need for a mechanism to augment the output of the disassembler,
and I thought it would be neat if I could do this as a Python
extension, which lead to this patch series.

The first four patches are some prep-work, and light refactoring,
before patch five which adds all the new functionality.

There's an overview of the new Pyton API in patch #5.

All feedback welcome,

Thanks,
Andrew

---

Andrew Burgess (5):
  gdb: make disassembler fprintf callback a static member function
  gdb/python: new gdb.architecture_names function
  gdb/python: move gdb.Membuf support into a new file
  gdb: add extension language print_insn hook
  gdb/python: implement the print_insn extension language hook

 gdb/Makefile.in                        |   2 +
 gdb/NEWS                               |  46 ++
 gdb/data-directory/Makefile.in         |   1 +
 gdb/disasm.c                           |  63 +-
 gdb/disasm.h                           |  16 +-
 gdb/doc/gdb.texinfo                    |  14 +
 gdb/doc/python.texi                    | 261 +++++++
 gdb/extension-priv.h                   |  15 +
 gdb/extension.c                        |  20 +
 gdb/extension.h                        |  17 +
 gdb/guile/guile.c                      |   6 +-
 gdb/python/lib/gdb/disassembler.py     | 194 ++++++
 gdb/python/py-arch.c                   |  32 +
 gdb/python/py-disasm.c                 | 905 +++++++++++++++++++++++++
 gdb/python/py-inferior.c               | 182 +----
 gdb/python/py-membuf.c                 | 226 ++++++
 gdb/python/python-internal.h           |  27 +
 gdb/python/python.c                    |  16 +
 gdb/testsuite/gdb.base/style.exp       |  45 +-
 gdb/testsuite/gdb.python/py-arch.exp   |  51 ++
 gdb/testsuite/gdb.python/py-disasm.c   |  25 +
 gdb/testsuite/gdb.python/py-disasm.exp | 201 ++++++
 gdb/testsuite/gdb.python/py-disasm.py  | 538 +++++++++++++++
 23 files changed, 2696 insertions(+), 207 deletions(-)
 create mode 100644 gdb/python/lib/gdb/disassembler.py
 create mode 100644 gdb/python/py-disasm.c
 create mode 100644 gdb/python/py-membuf.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 1/5] gdb: make disassembler fprintf callback a static member function
  2021-10-13 21:59 [PATCH 0/5] Add Python API for the disassembler Andrew Burgess
@ 2021-10-13 21:59 ` Andrew Burgess
  2021-10-20 20:40   ` Tom Tromey
  2021-10-13 21:59 ` [PATCH 2/5] gdb/python: new gdb.architecture_names function Andrew Burgess
                   ` (4 subsequent siblings)
  5 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2021-10-13 21:59 UTC (permalink / raw)
  To: gdb-patches

The disassemble_info structure has four callbacks, we have three of
them as static member functions within gdb_disassembler, the forth is
just a global static function.

However, this forth callback, is still only used from the
disassemble_info struct, so there's no real reason for its special
handling.

This commit makes fprintf_disasm a static method within
gdb_disassembler.

There should be no user visible changes after this commit.
---
 gdb/disasm.c | 31 +++++++++++++++----------------
 gdb/disasm.h |  3 +++
 2 files changed, 18 insertions(+), 16 deletions(-)

diff --git a/gdb/disasm.c b/gdb/disasm.c
index dc6426718bb..c045dfc94a6 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -163,6 +163,20 @@ gdb_disassembler::dis_asm_print_address (bfd_vma addr,
   print_address (self->arch (), addr, self->stream ());
 }
 
+/* Format disassembler output to STREAM.  */
+
+int
+gdb_disassembler::dis_asm_fprintf (void *stream, const char *format, ...)
+{
+  va_list args;
+
+  va_start (args, format);
+  vfprintf_filtered ((struct ui_file *) stream, format, args);
+  va_end (args);
+  /* Something non -ve.  */
+  return 0;
+}
+
 static bool
 line_is_less_than (const deprecated_dis_line_entry &mle1,
 		   const deprecated_dis_line_entry &mle2)
@@ -711,21 +725,6 @@ do_assembly_only (struct gdbarch *gdbarch, struct ui_out *uiout,
   dump_insns (gdbarch, uiout, low, high, how_many, flags, NULL);
 }
 
-/* Initialize the disassemble info struct ready for the specified
-   stream.  */
-
-static int ATTRIBUTE_PRINTF (2, 3)
-fprintf_disasm (void *stream, const char *format, ...)
-{
-  va_list args;
-
-  va_start (args, format);
-  vfprintf_filtered ((struct ui_file *) stream, format, args);
-  va_end (args);
-  /* Something non -ve.  */
-  return 0;
-}
-
 /* Combine implicit and user disassembler options and return them
    in a newly-created string.  */
 
@@ -756,7 +755,7 @@ gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
 				    di_read_memory_ftype read_memory_func)
   : m_gdbarch (gdbarch)
 {
-  init_disassemble_info (&m_di, file, fprintf_disasm);
+  init_disassemble_info (&m_di, file, dis_asm_fprintf);
   m_di.flavour = bfd_target_unknown_flavour;
   m_di.memory_error_func = dis_asm_memory_error;
   m_di.print_address_func = dis_asm_print_address;
diff --git a/gdb/disasm.h b/gdb/disasm.h
index d3642d8ca01..f6de33e3db8 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -82,6 +82,9 @@ class gdb_disassembler
      non-memory error.  */
   gdb::optional<CORE_ADDR> m_err_memaddr;
 
+  static int dis_asm_fprintf (void *stream, const char *format, ...)
+    ATTRIBUTE_PRINTF(2,3);
+
   static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
 				  unsigned int len,
 				  struct disassemble_info *info);
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 2/5] gdb/python: new gdb.architecture_names function
  2021-10-13 21:59 [PATCH 0/5] Add Python API for the disassembler Andrew Burgess
  2021-10-13 21:59 ` [PATCH 1/5] gdb: make disassembler fprintf callback a static member function Andrew Burgess
@ 2021-10-13 21:59 ` Andrew Burgess
  2021-10-14  6:52   ` Eli Zaretskii
                     ` (2 more replies)
  2021-10-13 21:59 ` [PATCH 3/5] gdb/python: move gdb.Membuf support into a new file Andrew Burgess
                   ` (3 subsequent siblings)
  5 siblings, 3 replies; 80+ messages in thread
From: Andrew Burgess @ 2021-10-13 21:59 UTC (permalink / raw)
  To: gdb-patches

Add a new function to the Python API, gdb.architecture_names().  This
function returns a list containing all of the supported architecture
names within the current build of GDB.

The values returned in this list are all of the possible values that
can be returned from gdb.Architecture.name().
---
 gdb/NEWS                             |  4 +++
 gdb/doc/python.texi                  |  9 +++++
 gdb/python/py-arch.c                 | 23 +++++++++++++
 gdb/python/python-internal.h         |  1 +
 gdb/python/python.c                  |  4 +++
 gdb/testsuite/gdb.python/py-arch.exp | 51 ++++++++++++++++++++++++++++
 6 files changed, 92 insertions(+)

diff --git a/gdb/NEWS b/gdb/NEWS
index bd26d2b1ec2..d001a03145d 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -45,6 +45,10 @@ maint show internal-warning backtrace
      event is triggered once GDB decides it is going to exit, but
      before GDB starts to clean up its internal state.
 
+  ** New function gdb.architecture_names(), which returns a list
+     containing all of the possible Architecture.name() values.  Each
+     entry is a string.
+
 *** Changes in GDB 11
 
 * The 'set disassembler-options' command now supports specifying options
diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
index 15bf9dc3e21..04192f906c8 100644
--- a/gdb/doc/python.texi
+++ b/gdb/doc/python.texi
@@ -557,6 +557,14 @@
 related prompts are prohibited from being changed.
 @end defun
 
+@defun gdb.architecture_names ()
+Return a list containing all of the architecture names that the
+current build of @value{GDBN} supports.  Each architecture name is a
+string.  The names returned in this list are the same names as are
+returned from @code{gdb.Architecture.name ()}
+(@pxref{gdbpy_architecture_name,,Architecture.name ()}).
+@end defun
+
 @node Exception Handling
 @subsubsection Exception Handling
 @cindex python exceptions
@@ -5834,6 +5842,7 @@
 
 A @code{gdb.Architecture} class has the following methods:
 
+@anchor{gdbpy_architecture_name}
 @defun Architecture.name ()
 Return the name (string value) of the architecture.
 @end defun
diff --git a/gdb/python/py-arch.c b/gdb/python/py-arch.c
index 66f2d28b94a..3e7970ab764 100644
--- a/gdb/python/py-arch.c
+++ b/gdb/python/py-arch.c
@@ -271,6 +271,29 @@ archpy_register_groups (PyObject *self, PyObject *args)
   return gdbpy_new_reggroup_iterator (gdbarch);
 }
 
+/* Implementation of gdb.architecture_names().  Return a list of all the
+   BFD architecture names that GDB understands.  */
+
+PyObject *
+gdbpy_all_architecture_names (PyObject *self, PyObject *args)
+{
+  gdbpy_ref<> list (PyList_New (0));
+  if (list == nullptr)
+    return nullptr;
+
+  std::vector<const char *> name_list = gdbarch_printable_names ();
+  for (const char *name : name_list)
+    {
+      gdbpy_ref <> py_name (PyString_FromString (name));
+      if (py_name == nullptr)
+	return nullptr;
+      if (PyList_Append (list.get (), py_name.get ()) < 0)
+	return nullptr;
+    }
+
+ return list.release ();
+}
+
 void _initialize_py_arch ();
 void
 _initialize_py_arch ()
diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
index 022d4a67172..2ad3bc944a7 100644
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@@ -469,6 +469,7 @@ PyObject *objfpy_get_xmethods (PyObject *, void *);
 PyObject *gdbpy_lookup_objfile (PyObject *self, PyObject *args, PyObject *kw);
 
 PyObject *gdbarch_to_arch_object (struct gdbarch *gdbarch);
+PyObject *gdbpy_all_architecture_names (PyObject *self, PyObject *args);
 
 PyObject *gdbpy_new_register_descriptor_iterator (struct gdbarch *gdbarch,
 						  const char *group_name);
diff --git a/gdb/python/python.c b/gdb/python/python.c
index 264f7c88ed6..6c1baa167d9 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -2299,6 +2299,10 @@ Set the value of the convenience variable $NAME." },
 Register a TUI window constructor." },
 #endif	/* TUI */
 
+  { "architecture_names", gdbpy_all_architecture_names, METH_NOARGS,
+    "architecture_names () -> List.\n\
+Return a list of all the architecture names GDB understands." },
+
   {NULL, NULL, 0, NULL}
 };
 
diff --git a/gdb/testsuite/gdb.python/py-arch.exp b/gdb/testsuite/gdb.python/py-arch.exp
index 4f971127197..415fbd475b0 100644
--- a/gdb/testsuite/gdb.python/py-arch.exp
+++ b/gdb/testsuite/gdb.python/py-arch.exp
@@ -62,3 +62,54 @@ if { ![is_address_zero_readable] } {
     gdb_test "python arch.disassemble(0, 0)" ".*gdb\.MemoryError.*" \
 	"test bad memory access"
 }
+
+# Test for gdb.architecture_names().  First we're going to grab the
+# complete list of architecture names using the 'complete' command.
+set arch_names []
+gdb_test_no_output "set max-completions unlimited"
+gdb_test_multiple "complete set architecture " "" {
+    -re "complete set architecture\[^\r\n\]+\r\n" {
+	exp_continue
+    }
+    -re "^set architecture \(\[^\r\n\]+\)\r\n" {
+	set arch $expect_out(1,string)
+	if { "$arch" != "auto" } {
+	    set arch_names [lappend arch_names $arch]
+	}
+	exp_continue
+    }
+    -re "^$gdb_prompt $" {
+	gdb_assert { [llength $arch_names] > 0 }
+    }
+}
+
+# Now find all of the architecture names using Python.
+set py_arch_names []
+gdb_test_no_output "python all_arch = gdb.architecture_names()"
+gdb_test_no_output "python all_arch.sort()"
+gdb_test_multiple "python print(\"\\n\".join((\"Arch: %s\" % a) for a in all_arch))" "" {
+    -re "python \[^\r\n\]+\r\n" {
+	exp_continue
+    }
+    -re "^Arch: \(\[^\r\n\]+\)\r\n" {
+	set arch $expect_out(1,string)
+	set py_arch_names [lappend py_arch_names $arch]
+	exp_continue
+    }
+    -re "$gdb_prompt $" {
+	gdb_assert { [llength $py_arch_names] > 0 }
+    }
+}
+
+# Check the two lists of architecture names are the same length, and
+# that the list contents all match.
+gdb_assert { [llength $arch_names] == [llength $py_arch_names] }
+set lists_match true
+foreach a $arch_names b $py_arch_names {
+    if { $a != $b } {
+	set lists_match false
+	verbose -log "Mismatch is architecture list '$a' != '$b'"
+	break
+    }
+}
+gdb_assert { $lists_match }
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 3/5] gdb/python: move gdb.Membuf support into a new file
  2021-10-13 21:59 [PATCH 0/5] Add Python API for the disassembler Andrew Burgess
  2021-10-13 21:59 ` [PATCH 1/5] gdb: make disassembler fprintf callback a static member function Andrew Burgess
  2021-10-13 21:59 ` [PATCH 2/5] gdb/python: new gdb.architecture_names function Andrew Burgess
@ 2021-10-13 21:59 ` Andrew Burgess
  2021-10-20 20:42   ` Tom Tromey
  2021-10-13 21:59 ` [PATCH 4/5] gdb: add extension language print_insn hook Andrew Burgess
                   ` (2 subsequent siblings)
  5 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2021-10-13 21:59 UTC (permalink / raw)
  To: gdb-patches

In a future commit I'm going to be creating gdb.Membuf objects from a
new file within gdb/python/py*.c.  Currently all gdb.Membuf objects
are created directly within infpy_read_memory (as a result of calling
gdb.Inferior.read_memory()).

Initially I split out the Membuf creation code into a new function,
and left the new function in gdb/python/py-inferior.c, however, it
felt a little random that the Membuf creation code should live with
the inferior handling code.

So, then I moved all of the Membuf related code out into a new file,
gdb/python/py-membuf.c, the interface is gdbpy_buffer_to_membuf, which
wraps an array of bytes into a gdb.Membuf object.

Most of the code is moved directly from py-inferior.c with only minor
tweaks to layout and replacing NULL with nullptr, hence, I've left the
copyright date on py-membuf.c as 2009-2021 to match py-inferior.c.

Currently, the only user of this code is still py-inferior.c, but in
later commits this will change.

There should be no user visible changes after this commit.
---
 gdb/Makefile.in              |   1 +
 gdb/python/py-inferior.c     | 182 +---------------------------
 gdb/python/py-membuf.c       | 226 +++++++++++++++++++++++++++++++++++
 gdb/python/python-internal.h |   5 +
 gdb/python/python.c          |   1 +
 5 files changed, 236 insertions(+), 179 deletions(-)
 create mode 100644 gdb/python/py-membuf.c

diff --git a/gdb/Makefile.in b/gdb/Makefile.in
index 4201f65e68d..ec5d332c145 100644
--- a/gdb/Makefile.in
+++ b/gdb/Makefile.in
@@ -407,6 +407,7 @@ SUBDIR_PYTHON_SRCS = \
 	python/py-instruction.c \
 	python/py-lazy-string.c \
 	python/py-linetable.c \
+	python/py-membuf.c \
 	python/py-newobjfileevent.c \
 	python/py-objfile.c \
 	python/py-param.c \
diff --git a/gdb/python/py-inferior.c b/gdb/python/py-inferior.c
index c8de41dd009..aec8c0f73cb 100644
--- a/gdb/python/py-inferior.c
+++ b/gdb/python/py-inferior.c
@@ -62,18 +62,6 @@ extern PyTypeObject inferior_object_type
 
 static const struct inferior_data *infpy_inf_data_key;
 
-struct membuf_object {
-  PyObject_HEAD
-  void *buffer;
-
-  /* These are kept just for mbpy_str.  */
-  CORE_ADDR addr;
-  CORE_ADDR length;
-};
-
-extern PyTypeObject membuf_object_type
-    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("membuf_object");
-
 /* Require that INFERIOR be a valid inferior ID.  */
 #define INFPY_REQUIRE_VALID(Inferior)				\
   do {								\
@@ -514,7 +502,7 @@ infpy_read_memory (PyObject *self, PyObject *args, PyObject *kw)
 {
   CORE_ADDR addr, length;
   gdb::unique_xmalloc_ptr<gdb_byte> buffer;
-  PyObject *addr_obj, *length_obj, *result;
+  PyObject *addr_obj, *length_obj;
   static const char *keywords[] = { "address", "length", NULL };
 
   if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "OO", keywords,
@@ -536,23 +524,8 @@ infpy_read_memory (PyObject *self, PyObject *args, PyObject *kw)
       GDB_PY_HANDLE_EXCEPTION (except);
     }
 
-  gdbpy_ref<membuf_object> membuf_obj (PyObject_New (membuf_object,
-						     &membuf_object_type));
-  if (membuf_obj == NULL)
-    return NULL;
-
-  membuf_obj->buffer = buffer.release ();
-  membuf_obj->addr = addr;
-  membuf_obj->length = length;
 
-#ifdef IS_PY3K
-  result = PyMemoryView_FromObject ((PyObject *) membuf_obj.get ());
-#else
-  result = PyBuffer_FromReadWriteObject ((PyObject *) membuf_obj.get (), 0,
-					 Py_END_OF_BUFFER);
-#endif
-
-  return result;
+  return gdbpy_buffer_to_membuf (std::move (buffer), addr, length);
 }
 
 /* Implementation of Inferior.write_memory (address, buffer [, length]).
@@ -602,93 +575,6 @@ infpy_write_memory (PyObject *self, PyObject *args, PyObject *kw)
   Py_RETURN_NONE;
 }
 
-/* Destructor of Membuf objects.  */
-static void
-mbpy_dealloc (PyObject *self)
-{
-  xfree (((membuf_object *) self)->buffer);
-  Py_TYPE (self)->tp_free (self);
-}
-
-/* Return a description of the Membuf object.  */
-static PyObject *
-mbpy_str (PyObject *self)
-{
-  membuf_object *membuf_obj = (membuf_object *) self;
-
-  return PyString_FromFormat (_("Memory buffer for address %s, \
-which is %s bytes long."),
-			      paddress (python_gdbarch, membuf_obj->addr),
-			      pulongest (membuf_obj->length));
-}
-
-#ifdef IS_PY3K
-
-static int
-get_buffer (PyObject *self, Py_buffer *buf, int flags)
-{
-  membuf_object *membuf_obj = (membuf_object *) self;
-  int ret;
-
-  ret = PyBuffer_FillInfo (buf, self, membuf_obj->buffer,
-			   membuf_obj->length, 0,
-			   PyBUF_CONTIG);
-
-  /* Despite the documentation saying this field is a "const char *",
-     in Python 3.4 at least, it's really a "char *".  */
-  buf->format = (char *) "c";
-
-  return ret;
-}
-
-#else
-
-static Py_ssize_t
-get_read_buffer (PyObject *self, Py_ssize_t segment, void **ptrptr)
-{
-  membuf_object *membuf_obj = (membuf_object *) self;
-
-  if (segment)
-    {
-      PyErr_SetString (PyExc_SystemError,
-		       _("The memory buffer supports only one segment."));
-      return -1;
-    }
-
-  *ptrptr = membuf_obj->buffer;
-
-  return membuf_obj->length;
-}
-
-static Py_ssize_t
-get_write_buffer (PyObject *self, Py_ssize_t segment, void **ptrptr)
-{
-  return get_read_buffer (self, segment, ptrptr);
-}
-
-static Py_ssize_t
-get_seg_count (PyObject *self, Py_ssize_t *lenp)
-{
-  if (lenp)
-    *lenp = ((membuf_object *) self)->length;
-
-  return 1;
-}
-
-static Py_ssize_t
-get_char_buffer (PyObject *self, Py_ssize_t segment, char **ptrptr)
-{
-  void *ptr = NULL;
-  Py_ssize_t ret;
-
-  ret = get_read_buffer (self, segment, &ptr);
-  *ptrptr = (char *) ptr;
-
-  return ret;
-}
-
-#endif	/* IS_PY3K */
-
 /* Implementation of
    gdb.search_memory (address, length, pattern).  ADDRESS is the
    address to start the search.  LENGTH specifies the scope of the
@@ -957,12 +843,7 @@ gdbpy_initialize_inferior (void)
   gdb::observers::inferior_removed.attach (python_inferior_deleted,
 					   "py-inferior");
 
-  membuf_object_type.tp_new = PyType_GenericNew;
-  if (PyType_Ready (&membuf_object_type) < 0)
-    return -1;
-
-  return gdb_pymodule_addobject (gdb_module, "Membuf",
-				 (PyObject *) &membuf_object_type);
+  return 0;
 }
 
 static gdb_PyGetSetDef inferior_object_getset[] =
@@ -1053,60 +934,3 @@ PyTypeObject inferior_object_type =
   0,				  /* tp_init */
   0				  /* tp_alloc */
 };
-
-#ifdef IS_PY3K
-
-static PyBufferProcs buffer_procs =
-{
-  get_buffer
-};
-
-#else
-
-static PyBufferProcs buffer_procs = {
-  get_read_buffer,
-  get_write_buffer,
-  get_seg_count,
-  get_char_buffer
-};
-#endif	/* IS_PY3K */
-
-PyTypeObject membuf_object_type = {
-  PyVarObject_HEAD_INIT (NULL, 0)
-  "gdb.Membuf",			  /*tp_name*/
-  sizeof (membuf_object),	  /*tp_basicsize*/
-  0,				  /*tp_itemsize*/
-  mbpy_dealloc,			  /*tp_dealloc*/
-  0,				  /*tp_print*/
-  0,				  /*tp_getattr*/
-  0,				  /*tp_setattr*/
-  0,				  /*tp_compare*/
-  0,				  /*tp_repr*/
-  0,				  /*tp_as_number*/
-  0,				  /*tp_as_sequence*/
-  0,				  /*tp_as_mapping*/
-  0,				  /*tp_hash */
-  0,				  /*tp_call*/
-  mbpy_str,			  /*tp_str*/
-  0,				  /*tp_getattro*/
-  0,				  /*tp_setattro*/
-  &buffer_procs,		  /*tp_as_buffer*/
-  Py_TPFLAGS_DEFAULT,		  /*tp_flags*/
-  "GDB memory buffer object", 	  /*tp_doc*/
-  0,				  /* tp_traverse */
-  0,				  /* tp_clear */
-  0,				  /* tp_richcompare */
-  0,				  /* tp_weaklistoffset */
-  0,				  /* tp_iter */
-  0,				  /* tp_iternext */
-  0,				  /* tp_methods */
-  0,				  /* tp_members */
-  0,				  /* tp_getset */
-  0,				  /* tp_base */
-  0,				  /* tp_dict */
-  0,				  /* tp_descr_get */
-  0,				  /* tp_descr_set */
-  0,				  /* tp_dictoffset */
-  0,				  /* tp_init */
-  0,				  /* tp_alloc */
-};
diff --git a/gdb/python/py-membuf.c b/gdb/python/py-membuf.c
new file mode 100644
index 00000000000..3978acec907
--- /dev/null
+++ b/gdb/python/py-membuf.c
@@ -0,0 +1,226 @@
+/* Python memory buffer interface for reading inferior memory.
+
+   Copyright (C) 2009-2021 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "defs.h"
+#include "python-internal.h"
+
+struct membuf_object {
+  PyObject_HEAD
+
+  /* Pointer to the raw data, and array of gdb_bytes.  */
+  void *buffer;
+
+  /* The address from where the data was read, held for mbpy_str.  */
+  CORE_ADDR addr;
+
+  /* The number of octets in BUFFER.  */
+  CORE_ADDR length;
+};
+
+extern PyTypeObject membuf_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("membuf_object");
+
+/* Wrap BUFFER, ADDRESS, and LENGTH into a gdb.Membuf object.  ADDRESS is
+   the address within the inferior that the contents of BUFFER were read,
+   and LENGTH is the number of octets in BUFFER.  */
+
+PyObject *
+gdbpy_buffer_to_membuf (gdb::unique_xmalloc_ptr<gdb_byte> buffer,
+			CORE_ADDR address,
+			ULONGEST length)
+{
+  gdbpy_ref<membuf_object> membuf_obj (PyObject_New (membuf_object,
+						     &membuf_object_type));
+  if (membuf_obj == nullptr)
+    return nullptr;
+
+  membuf_obj->buffer = buffer.release ();
+  membuf_obj->addr = address;
+  membuf_obj->length = length;
+
+  PyObject *result;
+#ifdef IS_PY3K
+  result = PyMemoryView_FromObject ((PyObject *) membuf_obj.get ());
+#else
+  result = PyBuffer_FromReadWriteObject ((PyObject *) membuf_obj.get (), 0,
+					 Py_END_OF_BUFFER);
+#endif
+
+  return result;
+}
+
+/* Destructor for gdb.Membuf objects.  */
+
+static void
+mbpy_dealloc (PyObject *self)
+{
+  xfree (((membuf_object *) self)->buffer);
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* Return a description of the gdb.Membuf object.  */
+
+static PyObject *
+mbpy_str (PyObject *self)
+{
+  membuf_object *membuf_obj = (membuf_object *) self;
+
+  return PyString_FromFormat (_("Memory buffer for address %s, \
+which is %s bytes long."),
+			      paddress (python_gdbarch, membuf_obj->addr),
+			      pulongest (membuf_obj->length));
+}
+
+#ifdef IS_PY3K
+
+static int
+get_buffer (PyObject *self, Py_buffer *buf, int flags)
+{
+  membuf_object *membuf_obj = (membuf_object *) self;
+  int ret;
+
+  ret = PyBuffer_FillInfo (buf, self, membuf_obj->buffer,
+			   membuf_obj->length, 0,
+			   PyBUF_CONTIG);
+
+  /* Despite the documentation saying this field is a "const char *",
+     in Python 3.4 at least, it's really a "char *".  */
+  buf->format = (char *) "c";
+
+  return ret;
+}
+
+#else
+
+static Py_ssize_t
+get_read_buffer (PyObject *self, Py_ssize_t segment, void **ptrptr)
+{
+  membuf_object *membuf_obj = (membuf_object *) self;
+
+  if (segment)
+    {
+      PyErr_SetString (PyExc_SystemError,
+		       _("The memory buffer supports only one segment."));
+      return -1;
+    }
+
+  *ptrptr = membuf_obj->buffer;
+
+  return membuf_obj->length;
+}
+
+static Py_ssize_t
+get_write_buffer (PyObject *self, Py_ssize_t segment, void **ptrptr)
+{
+  return get_read_buffer (self, segment, ptrptr);
+}
+
+static Py_ssize_t
+get_seg_count (PyObject *self, Py_ssize_t *lenp)
+{
+  if (lenp)
+    *lenp = ((membuf_object *) self)->length;
+
+  return 1;
+}
+
+static Py_ssize_t
+get_char_buffer (PyObject *self, Py_ssize_t segment, char **ptrptr)
+{
+  void *ptr = nullptr;
+  Py_ssize_t ret;
+
+  ret = get_read_buffer (self, segment, &ptr);
+  *ptrptr = (char *) ptr;
+
+  return ret;
+}
+
+#endif	/* IS_PY3K */
+
+/* General Python initialization callback.  */
+
+int
+gdbpy_initialize_membuf (void)
+{
+  membuf_object_type.tp_new = PyType_GenericNew;
+  if (PyType_Ready (&membuf_object_type) < 0)
+    return -1;
+
+  return gdb_pymodule_addobject (gdb_module, "Membuf",
+				 (PyObject *) &membuf_object_type);
+}
+
+#ifdef IS_PY3K
+
+static PyBufferProcs buffer_procs =
+{
+  get_buffer
+};
+
+#else
+
+static PyBufferProcs buffer_procs = {
+  get_read_buffer,
+  get_write_buffer,
+  get_seg_count,
+  get_char_buffer
+};
+
+#endif	/* IS_PY3K */
+
+PyTypeObject membuf_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.Membuf",			  /*tp_name*/
+  sizeof (membuf_object),	  /*tp_basicsize*/
+  0,				  /*tp_itemsize*/
+  mbpy_dealloc,			  /*tp_dealloc*/
+  0,				  /*tp_print*/
+  0,				  /*tp_getattr*/
+  0,				  /*tp_setattr*/
+  0,				  /*tp_compare*/
+  0,				  /*tp_repr*/
+  0,				  /*tp_as_number*/
+  0,				  /*tp_as_sequence*/
+  0,				  /*tp_as_mapping*/
+  0,				  /*tp_hash */
+  0,				  /*tp_call*/
+  mbpy_str,			  /*tp_str*/
+  0,				  /*tp_getattro*/
+  0,				  /*tp_setattro*/
+  &buffer_procs,		  /*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT,		  /*tp_flags*/
+  "GDB memory buffer object", 	  /*tp_doc*/
+  0,				  /* tp_traverse */
+  0,				  /* tp_clear */
+  0,				  /* tp_richcompare */
+  0,				  /* tp_weaklistoffset */
+  0,				  /* tp_iter */
+  0,				  /* tp_iternext */
+  0,				  /* tp_methods */
+  0,				  /* tp_members */
+  0,				  /* tp_getset */
+  0,				  /* tp_base */
+  0,				  /* tp_dict */
+  0,				  /* tp_descr_get */
+  0,				  /* tp_descr_set */
+  0,				  /* tp_dictoffset */
+  0,				  /* tp_init */
+  0,				  /* tp_alloc */
+};
diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
index 2ad3bc944a7..735328b49c4 100644
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@@ -479,6 +479,9 @@ gdbpy_ref<thread_object> create_thread_object (struct thread_info *tp);
 gdbpy_ref<> thread_to_thread_object (thread_info *thr);;
 gdbpy_ref<inferior_object> inferior_to_inferior_object (inferior *inf);
 
+PyObject *gdbpy_buffer_to_membuf (gdb::unique_xmalloc_ptr<gdb_byte> buffer,
+				  CORE_ADDR address, ULONGEST length);
+
 const struct block *block_object_to_block (PyObject *obj);
 struct symbol *symbol_object_to_symbol (PyObject *obj);
 struct value *value_object_to_value (PyObject *self);
@@ -550,6 +553,8 @@ int gdbpy_initialize_unwind (void)
   CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 int gdbpy_initialize_tui ()
   CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
+int gdbpy_initialize_membuf (void)
+  CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 
 /* A wrapper for PyErr_Fetch that handles reference counting for the
    caller.  */
diff --git a/gdb/python/python.c b/gdb/python/python.c
index 6c1baa167d9..2c2d8c5f217 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -1876,6 +1876,7 @@ do_start_initialization ()
       || gdbpy_initialize_registers () < 0
       || gdbpy_initialize_xmethods () < 0
       || gdbpy_initialize_unwind () < 0
+      || gdbpy_initialize_membuf () < 0
       || gdbpy_initialize_tui () < 0)
     return false;
 
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 4/5] gdb: add extension language print_insn hook
  2021-10-13 21:59 [PATCH 0/5] Add Python API for the disassembler Andrew Burgess
                   ` (2 preceding siblings ...)
  2021-10-13 21:59 ` [PATCH 3/5] gdb/python: move gdb.Membuf support into a new file Andrew Burgess
@ 2021-10-13 21:59 ` Andrew Burgess
  2021-10-20 21:06   ` Tom Tromey
  2021-10-13 21:59 ` [PATCH 5/5] gdb/python: implement the print_insn extension language hook Andrew Burgess
  2022-03-23 22:41 ` [PATCHv2 0/3] Add Python API for the disassembler Andrew Burgess
  5 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2021-10-13 21:59 UTC (permalink / raw)
  To: gdb-patches

This commit is setup for the next commit.

In the next commit I plan to add a Python API to intercept the
print_insn calls within GDB, each print_insn call is responsible for
disassembling, and print one instruction.  After the next commit it
will be possible for a user to write Python code that either wraps
around the existing disassembler, or even, in extreme situations,
entirely replaces the existing disassembler.

This commit does not add any new Python API.

What this commit does is put the extension language framework in place
for a print_insn hook.  There's a new callback added to 'struct
extension_language_ops', which is then filled in with NULL for Python
and Guile.

Finally, in the disassembler, the code is restructured so that the new
extension language function ext_lang_print_insn is called before we
delegate to gdbarch_print_insn.

After this, the next commit can focus entirely on providing a Python
implementation of the new print_insn callback.

There should be no user visible change after this commit.
---
 gdb/disasm.c         | 27 +++++++++++++++++++++++++--
 gdb/extension-priv.h | 15 +++++++++++++++
 gdb/extension.c      | 20 ++++++++++++++++++++
 gdb/extension.h      | 17 +++++++++++++++++
 gdb/guile/guile.c    |  6 +++++-
 gdb/python/python.c  |  2 ++
 6 files changed, 84 insertions(+), 3 deletions(-)

diff --git a/gdb/disasm.c b/gdb/disasm.c
index c045dfc94a6..0c384c778f5 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -784,13 +784,36 @@ gdb_disassembler::~gdb_disassembler ()
   disassemble_free_target (&m_di);
 }
 
+/* Wrapper around calling gdbarch_print_insn.  This function takes care of
+   first calling the extension language hooks for print_insn, and, if none
+   of the extension languages can print this instruction, calls
+   gdbarch_print_insn to do the work.
+
+   GDBARCH is the architecture to disassemble in, VMA is the address of the
+   instruction being disassembled, and INFO is the libopcodes disassembler
+   related information.  */
+
+static int
+gdb_print_insn_1 (struct gdbarch *gdbarch, CORE_ADDR vma,
+		  struct disassemble_info *info)
+{
+  /* Call into the extension languages to do the disassembly.  */
+  gdb::optional<int> length = ext_lang_print_insn (gdbarch, vma, info);
+  if (length.has_value ())
+    return *length;
+
+  /* No extension language wanted to do the disassembly, so do it
+     manually.  */
+  return gdbarch_print_insn (gdbarch, vma, info);
+}
+
 int
 gdb_disassembler::print_insn (CORE_ADDR memaddr,
 			      int *branch_delay_insns)
 {
   m_err_memaddr.reset ();
 
-  int length = gdbarch_print_insn (arch (), memaddr, &m_di);
+  int length = gdb_print_insn_1 (arch (), memaddr, &m_di);
 
   if (length < 0)
     {
@@ -916,7 +939,7 @@ gdb_buffered_insn_length (struct gdbarch *gdbarch,
   gdb_buffered_insn_length_init_dis (gdbarch, &di, insn, max_len, addr,
 				     &disassembler_options_holder);
 
-  int result = gdbarch_print_insn (gdbarch, addr, &di);
+  int result = gdb_print_insn_1 (gdbarch, addr, &di);
   disassemble_free_target (&di);
   return result;
 }
diff --git a/gdb/extension-priv.h b/gdb/extension-priv.h
index 77f23e0f911..6c5cde12ffd 100644
--- a/gdb/extension-priv.h
+++ b/gdb/extension-priv.h
@@ -257,6 +257,21 @@ struct extension_language_ops
      or an empty option.  */
   gdb::optional<std::string> (*colorize) (const std::string &name,
 					  const std::string &contents);
+
+  /* Print a single instruction from ADDRESS in architecture GDBARCH.  INFO
+     is the standard libopcodes disassembler_info structure.  Bytes for the
+     instruction being printed should be read using INFO->read_memory_func
+     as the actual instruction bytes might be in a buffer.
+
+     Use INFO->fprintf_func to print the results of the disassembly, and
+     return the length of the instruction.
+
+     If no instruction can be disassembled then return an empty value and
+     other extension languages will get a chance to perform the
+     disassembly.  */
+  gdb::optional<int> (*print_insn) (struct gdbarch *gdbarch,
+				    CORE_ADDR address,
+				    struct disassemble_info *info);
 };
 
 /* State necessary to restore a signal handler to its previous value.  */
diff --git a/gdb/extension.c b/gdb/extension.c
index 27dce9befa0..9a002e425b1 100644
--- a/gdb/extension.c
+++ b/gdb/extension.c
@@ -893,6 +893,26 @@ ext_lang_colorize (const std::string &filename, const std::string &contents)
   return result;
 }
 
+/* See extension.h.  */
+
+gdb::optional<int>
+ext_lang_print_insn (struct gdbarch *gdbarch, CORE_ADDR address,
+		     struct disassemble_info *info)
+{
+  for (const struct extension_language_defn *extlang : extension_languages)
+    {
+      if (extlang->ops == nullptr
+	  || extlang->ops->print_insn == nullptr)
+	continue;
+      gdb::optional<int> length
+	(extlang->ops->print_insn (gdbarch, address, info));
+      if (length.has_value ())
+	return length;
+    }
+
+  return {};
+}
+
 /* Called via an observer before gdb prints its prompt.
    Iterate over the extension languages giving them a chance to
    change the prompt.  The first one to change the prompt wins,
diff --git a/gdb/extension.h b/gdb/extension.h
index 56f57560de3..fa292a6cb4f 100644
--- a/gdb/extension.h
+++ b/gdb/extension.h
@@ -319,4 +319,21 @@ extern void get_matching_xmethod_workers
 extern gdb::optional<std::string> ext_lang_colorize
   (const std::string &filename, const std::string &contents);
 
+/* Try to disassemble a single instruction.  ADDRESS is the address that
+   the instructions apparent address, though bytes for the instruction
+   should be read by calling INFO->read_memory_func as we might be
+   disassembling out of a buffer.  GDBARCH is the architecture in which we
+   are performing the disassembly.
+
+   The disassembled instruction should be printed by calling
+   INFO->fprintf_func, and the length (in octets) of the disassembled
+   instruction should be returned.
+
+   If no instruction could be disassembled then an empty value is returned
+   and GDB will call gdbarch_print_insn to perform the disassembly
+   itself.  */
+
+extern gdb::optional<int> ext_lang_print_insn
+  (struct gdbarch *gdbarch, CORE_ADDR address, struct disassemble_info *info);
+
 #endif /* EXTENSION_H */
diff --git a/gdb/guile/guile.c b/gdb/guile/guile.c
index 8ba840cba6a..92f2f5c78ef 100644
--- a/gdb/guile/guile.c
+++ b/gdb/guile/guile.c
@@ -130,8 +130,12 @@ static const struct extension_language_ops guile_extension_ops =
   gdbscm_breakpoint_has_cond,
   gdbscm_breakpoint_cond_says_stop,
 
-  NULL, /* gdbscm_check_quit_flag, */
   NULL, /* gdbscm_set_quit_flag, */
+  NULL, /* gdbscm_check_quit_flag, */
+  NULL, /* gdbscm_before_prompt, */
+  NULL, /* gdbscm_get_matching_xmethod_workers */
+  NULL, /* gdbscm_colorize */
+  NULL, /* gdbscm_print_insn */
 };
 #endif
 
diff --git a/gdb/python/python.c b/gdb/python/python.c
index 2c2d8c5f217..d817bd5bf27 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -189,6 +189,8 @@ const struct extension_language_ops python_extension_ops =
   gdbpy_get_matching_xmethod_workers,
 
   gdbpy_colorize,
+
+  NULL, /* gdbpy_print_insn, */
 };
 
 /* Architecture and language to be used in callbacks from
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCH 5/5] gdb/python: implement the print_insn extension language hook
  2021-10-13 21:59 [PATCH 0/5] Add Python API for the disassembler Andrew Burgess
                   ` (3 preceding siblings ...)
  2021-10-13 21:59 ` [PATCH 4/5] gdb: add extension language print_insn hook Andrew Burgess
@ 2021-10-13 21:59 ` Andrew Burgess
  2021-10-14  7:12   ` Eli Zaretskii
  2021-10-22 13:30   ` Simon Marchi
  2022-03-23 22:41 ` [PATCHv2 0/3] Add Python API for the disassembler Andrew Burgess
  5 siblings, 2 replies; 80+ messages in thread
From: Andrew Burgess @ 2021-10-13 21:59 UTC (permalink / raw)
  To: gdb-patches

This commit extends the Python API to include disassembler support,
and additionally provides a syntax highlighting disassembler.

The motivation for this commit was to provide an API by which the user
could write Python scripts that would augment the output of the
disassembler.

To achieve this I have followed the model of the existing libopcodes
disassembler, that is, instructions are disassembled one by one.  This
does restrict the type of things that it is possible to do from a
Python script, i.e. all additional output has to fit on a single line,
but this was all I needed, and creating something more complex would,
I think, require greater changes to how GDB's internal disassembler
operates.

It was only once I had a working prototype that I realised I could
very easily use this to perform syntax highlighting on GDB's
disassembly output, so I've included that too.  The new commands added
are:

  set style disassembly on|off
  show style disassembly

which enable or disable disassembly syntax highlighting.

The disassembler API is contained in the new gdb.disassembler module,
which defines the following classes:

  DisassembleInfo

      Similar to libopcodes disassemble_info structure, has read-only
  attributes: address, string, length, architecture, and
  can_emit_style_escape.  And has methods: read_memory, set_result,
  and memory_error.

      Each time GDB wants an instruction disassembled, an instance of
  this class is passed to a user written disassembler, by reading the
  attributes, and calling the methods, the user can perform
  disassembly, and set the result within the DisassembleInfo instance.

  Disassembler

      This is a base-class which user written disassemblers should
  inherit from, just provides base implementations of __init__ and
  __call__ which the user written disassembler should override.

The gdb.disassembler module also provides the following functions:

  register_disassembler

      This function registers an instance of a Disassembler sub-class
  as a disassembler, either for one specific architecture, or, as a
  global disassembler for all architectures.

  format_address

      This wraps GDB's print_address function, converting an address
  into a string that can be placed into disassembler output.

  syntax_highlight

      This adds syntax highlighting escapes to some disassembler
  output, users can call this from their own custom disassemblers to
  retain syntax highlighting, this function handles switching syntax
  highlighting off, or the case where the pygments library is not
  available.

  builtin_disassemble

      This provides access to GDB's builtin disassembler.  A common
  user case that I see is augmenting the existing disassembler
  output.  The user code can call this function to have GDB
  disassemble the instruction in the normal way, and then the user can
  tweak the output before returning that as the result.  This function
  also provides a mechanism to intercept the disassemblers reads of
  memory, thus the user can adjust what GDB sees when it is
  disassembling.

The included documentation provides a more detailed description of the
API.
---
 gdb/Makefile.in                        |   1 +
 gdb/NEWS                               |  42 ++
 gdb/data-directory/Makefile.in         |   1 +
 gdb/disasm.c                           |   5 +-
 gdb/disasm.h                           |  13 +-
 gdb/doc/gdb.texinfo                    |  14 +
 gdb/doc/python.texi                    | 252 +++++++
 gdb/python/lib/gdb/disassembler.py     | 194 ++++++
 gdb/python/py-arch.c                   |   9 +
 gdb/python/py-disasm.c                 | 905 +++++++++++++++++++++++++
 gdb/python/python-internal.h           |  21 +
 gdb/python/python.c                    |  11 +-
 gdb/testsuite/gdb.base/style.exp       |  45 +-
 gdb/testsuite/gdb.python/py-disasm.c   |  25 +
 gdb/testsuite/gdb.python/py-disasm.exp | 201 ++++++
 gdb/testsuite/gdb.python/py-disasm.py  | 538 +++++++++++++++
 16 files changed, 2267 insertions(+), 10 deletions(-)
 create mode 100644 gdb/python/lib/gdb/disassembler.py
 create mode 100644 gdb/python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

diff --git a/gdb/Makefile.in b/gdb/Makefile.in
index ec5d332c145..3981cc9507c 100644
--- a/gdb/Makefile.in
+++ b/gdb/Makefile.in
@@ -392,6 +392,7 @@ SUBDIR_PYTHON_SRCS = \
 	python/py-breakpoint.c \
 	python/py-cmd.c \
 	python/py-continueevent.c \
+	python/py-disasm.c \
 	python/py-event.c \
 	python/py-evtregistry.c \
 	python/py-evts.c \
diff --git a/gdb/NEWS b/gdb/NEWS
index d001a03145d..fd1952a2f59 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -32,6 +32,12 @@ maint show internal-warning backtrace
   internal-error, or an internal-warning.  This is on by default for
   internal-error and off by default for internal-warning.
 
+set style disassembly on|off
+show style disassembly
+  If GDB is compiled with Python support, and the Python pygments
+  module is available, then, when this setting is on, disassembler
+  output will have styling applied.
+
 * Python API
 
   ** New function gdb.add_history(), which takes a gdb.Value object
@@ -49,6 +55,42 @@ maint show internal-warning backtrace
      containing all of the possible Architecture.name() values.  Each
      entry is a string.
 
+  ** New Python API for wrapping GDB's disassembler:
+
+     - gdb.disassembler.register_disassembler(DISASSEMBLER, ARCH).
+       DISASSEMBLER is a sub-class of gdb.disassembler.Disassembler.
+       ARCH is either None or a string containing a bfd architecture
+       name.  DISASSEMBLER is registered as a disassembler for
+       architecture ARCH, or for all architectures if ARCH is None.
+       The previous disassembler registered for ARCH is returned, this
+       can be None if no previous disassembler was registered.
+
+     - gdb.disassembler.Disassembler is the class from which all
+       disassemblers should inherit.  Its constructor takes a string,
+       a name for the disassembler, which is currently only used is
+       some debug output.  Sub-classes should override the __call__
+       method to perform disassembly, invoking __call__ on this base
+       class will raise an exception.
+
+     - gdb.disassembler.DisassembleInfo is the class used to describe
+       a single disassembly request from GDB.  An instace of this
+       class is passed to the __call__ method of
+       gdb.disassembler.Disassembler and has the following read-only
+       attributes: 'address', 'string', 'length', 'architecture',
+       'can_emit_style_escape', and the following methods
+       'read_memory', 'set_result', and 'memory error'.
+
+     - gdb.disassembler.format_address(ARCHITECTURE, ADDRESS), formats
+       an address into a string so that the string can be included in
+       the disassembler output.  ARCHITECTURE is a gdb.Architecture
+       object.
+
+     - gdb.disassembler.builtin_disassemble(INFO, MEMORY_SOURCE),
+       calls GDB's builtin disassembler on INFO, which is a
+       gdb.disassembler.DisassembleInfo object.  MEMORY_SOURCE is
+       optional, its default value is None.  If MEMORY_SOURCE is not
+       None then it must be an object that has a 'read_memory' method.
+
 *** Changes in GDB 11
 
 * The 'set disassembler-options' command now supports specifying options
diff --git a/gdb/data-directory/Makefile.in b/gdb/data-directory/Makefile.in
index 888325f974e..775516a53cc 100644
--- a/gdb/data-directory/Makefile.in
+++ b/gdb/data-directory/Makefile.in
@@ -69,6 +69,7 @@ PYTHON_DIR = python
 PYTHON_INSTALL_DIR = $(DESTDIR)$(GDB_DATADIR)/$(PYTHON_DIR)
 PYTHON_FILE_LIST = \
 	gdb/__init__.py \
+	gdb/disassembler.py \
 	gdb/FrameDecorator.py \
 	gdb/FrameIterator.py \
 	gdb/frames.py \
diff --git a/gdb/disasm.c b/gdb/disasm.c
index 0c384c778f5..3a0a11ec3bb 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -752,12 +752,13 @@ get_all_disassembler_options (struct gdbarch *gdbarch)
 
 gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
 				    struct ui_file *file,
-				    di_read_memory_ftype read_memory_func)
+				    di_read_memory_ftype read_memory_func,
+				    di_memory_error_ftype memory_error_func)
   : m_gdbarch (gdbarch)
 {
   init_disassemble_info (&m_di, file, dis_asm_fprintf);
   m_di.flavour = bfd_target_unknown_flavour;
-  m_di.memory_error_func = dis_asm_memory_error;
+  m_di.memory_error_func = memory_error_func;
   m_di.print_address_func = dis_asm_print_address;
   /* NOTE: cagney/2003-04-28: The original code, from the old Insight
      disassembler had a local optimization here.  By default it would
diff --git a/gdb/disasm.h b/gdb/disasm.h
index f6de33e3db8..eca116c98f8 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -41,6 +41,7 @@ struct ui_file;
 class gdb_disassembler
 {
   using di_read_memory_ftype = decltype (disassemble_info::read_memory_func);
+  using di_memory_error_ftype = decltype (disassemble_info::memory_error_func);
 
 public:
   gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
@@ -59,11 +60,21 @@ class gdb_disassembler
 
 protected:
   gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file,
-		    di_read_memory_ftype func);
+		    di_read_memory_ftype read_memory_func)
+    : gdb_disassembler (gdbarch, file, read_memory_func,
+			dis_asm_memory_error)
+  { /* Nothing.  */ }
+
+  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file,
+		    di_read_memory_ftype read_memory_func,
+		    di_memory_error_ftype memory_error_func);
 
   struct ui_file *stream ()
   { return (struct ui_file *) m_di.stream; }
 
+  struct disassemble_info *disasm_info ()
+  { return &m_di; }
+
 private:
   struct gdbarch *m_gdbarch;
 
diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
index 631a7c03b31..9af415cc018 100644
--- a/gdb/doc/gdb.texinfo
+++ b/gdb/doc/gdb.texinfo
@@ -26071,6 +26071,20 @@
 
 @item show style sources
 Show the current state of source code styling.
+
+@item set style disassembly @samp{on|off}
+Enable or disable disassembly styling.  This affects whether
+disassembly output, such as the output of the @code{disassemble}
+command, is styled.  The default is @samp{on}.  Note that disassembly
+styling only works if styling in general is enabled, and if a source
+highlighting library is available to @value{GDBN}.
+
+To highlight disassembly output @value{GDBN} must be compiled with
+Python support, and the Python Pygments package must be available,
+
+@item show style disassembly
+Show the current state of disassembly styling.
+
 @end table
 
 Subcommands of @code{set style} control specific forms of styling.
diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
index 04192f906c8..808934aea73 100644
--- a/gdb/doc/python.texi
+++ b/gdb/doc/python.texi
@@ -221,6 +221,7 @@
 * Architectures In Python::     Python representation of architectures.
 * Registers In Python::         Python representation of registers.
 * TUI Windows In Python::       Implementing new TUI windows.
+* Disassembly In Python::       Instruction Disassembly In Python
 @end menu
 
 @node Basic Python
@@ -557,6 +558,7 @@
 related prompts are prohibited from being changed.
 @end defun
 
+@anchor{gdb_architecture_names}
 @defun gdb.architecture_names ()
 Return a list containing all of the architecture names that the
 current build of @value{GDBN} supports.  Each architecture name is a
@@ -3136,6 +3138,7 @@
 particular frame (@pxref{Frames In Python}).
 @end defun
 
+@anchor{gdbpy_inferior_read_memory}
 @findex Inferior.read_memory
 @defun Inferior.read_memory (address, length)
 Read @var{length} addressable memory units from the inferior, starting at
@@ -6075,6 +6078,255 @@
 2 (middle), or 3 (right).
 @end defun
 
+@node Disassembly In Python
+@cindex Python Instruction Disassembly
+@subsubsection Instruction Disassembly In Python
+
+@value{GDBN}'s builtin disassembler can be extended, or even replaced,
+using the Python API.  The disassembler related features are contained
+within the @code{gdb.disassembler} module:
+
+@deftp {class} DisassembleInfo
+Disassembly is driven by instances of this class.  Each time
+@value{GDBN} needs to disassemble an instruction, an instance of this
+class is created and passed to a registered disassembler.  The
+disassembler is then responsible for disassembling an instruction and
+storing the result within the instance of this class.  The following
+attributes and methods are available:
+
+@defivar DisassembleInfo address
+An integer containing the address at which @value{GDBN} wishes to
+disassemble a single instruction.
+@end defivar
+
+@defivar DisassembleInfo string
+A string that is the result of the disassembly.  If no result has yet
+been set then this field contains @code{None}.
+@end defivar
+
+@defivar DisassembleInfo length
+An integer that is the length of the disassembled instruction in
+bytes, or @code{None} if no result has yet been set for this
+instruction.
+
+When a result has been set then the length will always be a non-zero
+positive integer.
+@end defivar
+
+@defivar DisassembleInfo architecture
+The @code{gdb.Architecture} (@pxref{Architectures In Python}) for
+which @value{GDBN} is currently disassembling.
+@end defivar
+
+@defivar DisassembleInfo can_emit_style_escapes
+This is @code{True} if the output stream that the disassembler is
+currently printing too can support escape sequences use for colors,
+otherwise this attribute is @code{False}.
+@end defivar
+
+@defmethod DisassembleInfo read_memory (length, offset)
+This method allows the disassembler to read the bytes of the
+instruction to be disassembled.  The method reads @var{length} bytes,
+starting at @var{offset} from @code{DisassembleInfo.address}.
+
+It is important that the disassembler read the instruction bytes using
+this method, rather than reading inferior memory directly, as in some
+cases @value{GDBN} disassembles from an internal buffer rather than
+directly from inferior memory.
+
+Returns a buffer object, which behaves much like an array or a string,
+just as @code{Inferior.read_memory} does
+(@pxref{gdbpy_inferior_read_memory,,Inferior.read_memory}).
+@end defmethod
+
+@defmethod DisassembleInfo set_result (length, string)
+This method is used to set the result after an instruction has
+successfully been disassembled.  The @var{length} is the length in
+bytes of the instruction, and @var{string} is the text that should be
+displayed for the disassembled output.
+
+The @var{length} must be greater than zero, and @var{string} must be a
+non-empty string.
+
+It is valid to call this method multiple times during the disassembly
+of a single instruction, each call replaces the previous result.  In
+this way it is possible to extend the output of a previous
+disassembler.
+
+If @code{DisassembleInfo.memory_error} has previously been called,
+then calling @code{DisassembleInfo.set_result} clears the memory error
+from this @code{DisassembleInfo}.
+@end defmethod
+
+@defmethod DisassembleInfo memory_error (offset)
+This method marks the @code{DisassembleInfo} as having experienced a
+@code{gdb.MemoryError} when trying to access memory of @var{offset}
+bytes from @code{DisassembleInfo.address}.
+
+It is valid to call @code{DisassembleInfo.memory_error} multiple times
+for a single instruction disassembly, but only the first memory error
+is recorded.
+
+If @code{DisassembleInfo.set_result} has already been called, then any
+result is discarded when @code{DisassembleInfo.memory_error} is
+called.
+@end defmethod
+@end deftp
+
+@deftp {class} Disassembler
+This is a base class from which all user implemented disassemblers
+must inherit.
+
+@defmethod Disassembler __init__ (name)
+The constructor takes @var{name}, a string, which should be a short
+name for this disassembler.  Currently, this name is only used in some
+debug output.
+@end defmethod
+
+@defmethod Disassembler __call__ (info)
+The @code{__call__} method must be overridden by sub-classes to
+perform disassembly.  Calling @code{__call__} on this base class will
+raise a @code{NotImplementedError} exception.
+
+The @var{info} argument is an instance of @code{DisassembleInfo}, and
+describes the instruction that @value{GDBN} wants disassembling.
+
+This function must return @code{None}.  If this function raises a
+@code{gdb.MemoryError} exception then @value{GDBN} will ignore the
+exception and fallback to using its builtin disassembler.  Raising any
+other exception is an error.
+@end defmethod
+@end deftp
+
+@defun register_disassembler (disassembler, architecture)
+The @var{disassembler} must be a sub-class of @code{Disassembler}.
+
+The optional @var{architecture} is either a string, or the value
+@code{None}.  If it is a string, then it should be the name of an
+architecture known to @value{GDBN}, as returned either from
+@code{gdb.Architecture.name()}
+(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from
+@code{gdb.architecture_names()}
+(@pxref{gdb_architecture_names,,gdb.architecture_names}).
+
+The @var{disassembler} will be installed for the architecture named by
+@var{architecture}, or if @var{architecture} is @code{None}, then
+@var{disassembler} will be installed as a global disassembler for use
+by all architectures.
+
+@value{GDBN} only records a single disassembler for each architecture,
+and a single global disassembler.  Calling
+@code{register_disassembler} for an architecture, or for the global
+disassembler, will replace any existing disassembler registered for
+that @var{architecture} value.  The previous disassembler is returned.
+
+When @value{GDBN} is looking for a disassembler to use, @value{GDBN}
+first looks for an architecture specific disassembler.  If none has
+been registered then @value{GDBN} looks for a global disassembler (one
+registered with @var{architecture} set to @code{None}).  Only one
+disassembler is called to perform disassembly, so, if there is both an
+architecture specific disassembler, and a global disassembler
+registered, it is the architecture specific disassembler that will be
+used.
+
+@value{GDBN} tracks the architecture specific, and global
+disassemblers separately, so it doesn't matter in which order
+disassemblers are created or registed, an architecture specific
+disassembler, if present, will always be used before a global
+disassembler.
+@end defun
+
+@defun format_address (architecture, address)
+Returns @var{address} formatted as a string, in a style suitable for
+including in the disassembly output of an instruction, for example a
+formatted address might look like:
+
+@smallexample
+0x00001042 <symbol+16>
+@end smallexample
+
+@var{architecture} is a @code{gdb.Architecture} (@pxref{Architectures
+In Python}), which is required to format the addresses correctly.
+This can be obtained from @code{DisassembleInfo.architecture}.
+@end defun
+
+@defun syntax_highlight (info)
+This function can be used to apply syntax highlighting to the result
+already held within @var{info}, a @code{DisassembleInfo}.
+
+After calling this function the result in @var{info} @emph{might} have
+been updated to include syntax highlighting escape sequences.  If
+syntax highlighting is disabled in @value{GDBN}, or the output stream
+doesn't support syntax highlighting, then this function will leave
+@var{info} unchanged.
+
+If @var{info} doesn't have a result set when this function is called
+then @var{info} will not be modified.
+
+This function returns @code{None}.
+@end defun
+
+@defun builtin_disassemble (info, memory_source)
+This function calls back into @value{GDBN}'s builtin disassembler to
+disassemble the instruction identified by @var{info}, an instance of
+@code{DisassembleInfo}.
+
+After calling this function, if the instruction disassembled
+successfully, then @var{info} will have been updated as though
+@code{Disassemble.set_result} had been called.  The results of the
+builtin disassembler can be examined by reading
+@code{DisassembleInfo.length} and @code{DisassembleInfo.string}.
+
+If the builtin disassembler fails then this function will raise a
+@code{gdb.MemoryError} exception.
+
+The optional @var{memory_source} argument has the default value of
+@code{None}, in which case, the builtin disassembler will read the
+instruction from memory in the normal way.
+
+If @var{memory_source} is not @code{None}, then it should be an
+instance of a class that implements the following method:
+
+@defmethod memory_source read_memory (length, offset)
+This method will be called by the builtin disassembler to fetch bytes
+of the instruction being disassembled.  @var{length} is the number of
+bytes to fetch, and @var{offset} is the offset from the address of the
+instruction being disassembled, this address is obtained from
+@code{DisassembleInfo.address}.
+
+This function should return a Python object that supports the buffer
+protocol, i.e. a string, an array, or the object returned from
+@code{DisassembleInfo.read_memory}.
+
+The length of the returned buffer @emph{must} be @var{length}
+otherwise a @code{ValueError} exception will be raised.
+
+Alternatively, this function can raise a @code{gdb.MemoryError}
+exception to indicate that the read failed, raising any other
+exception type is an error.
+@end defmethod
+@end defun
+
+Here is an example that registers a global disassembler.  The new
+disassembler invokes the builtin disassembler, and then adds a
+comment, @code{## Comment}, to each line of disassembly output, before
+finally applying syntax highlighting to the result:
+
+@smallexample
+class ExampleDisassembler(gdb.disassembler.Disassembler):
+    def __init__(self):
+        super(ExampleDisassembler, self).__init__("ExampleDisassembler")
+
+    def __call__(self, info):
+        gdb.disassembler.builtin_disassemble(info)
+        if info.string is not None:
+            tmp = info.string + "\t## Comment"
+            info.set_result(info.length, tmp)
+            gdb.disassembler.syntax_highlight(info)
+
+gdb.disassembler.register_disassembler(ExampleDisassembler())
+@end smallexample
+
 @node Python Auto-loading
 @subsection Python Auto-loading
 @cindex Python auto-loading
diff --git a/gdb/python/lib/gdb/disassembler.py b/gdb/python/lib/gdb/disassembler.py
new file mode 100644
index 00000000000..9cf247a89e7
--- /dev/null
+++ b/gdb/python/lib/gdb/disassembler.py
@@ -0,0 +1,194 @@
+# Copyright (C) 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+"""Disassembler related module."""
+
+import gdb
+import _gdb.disassembler
+
+from _gdb.disassembler import *
+
+# Module global dictionary of gdb.disassembler.Disassembler objects.
+# The keys of this dictionary are bfd architecture names, or the
+# special value None.
+#
+# When a request to disassemble comes in we first lookup the bfd
+# architecture name from the gdbarch, if that name exists in this
+# dictionary then we use that Disassembler object.
+#
+# If there's no architecture specific disassembler then we look for
+# the key None in this dictionary, and if that key exists, we use that
+# disassembler.
+_disassembly_registry = {}
+
+# Module global callback.  This is the entry point that GDB calls, but
+# only if this is a callable thing.
+#
+# Initially we set this to None, so GDB will not try to call into any
+# Python code.
+#
+# When Python disassemblers are registered into _disassembly_registry
+# then this will be set to something callable.
+_print_insn = None
+
+
+class Disassembler(object):
+    """A base class from which all user implemented disassemblers must
+    inherit."""
+
+    def __init__(self, name):
+        """Constructor.  Takes a name, which should be a string, which can be
+        used to identify this disassembler in diagnostic messages."""
+        self.name = name
+
+    def __call__(self, info):
+        """A default implementation of __call__.  All sub-classes must
+        override this method.  Calling this default implementation will throw
+        a NotImplementedError exception."""
+        raise NotImplementedError("Disassembler.__call__")
+
+
+def register_disassembler(disassembler, architecture=None):
+    """Register a disassembler.  DISASSEMBLER is a sub-class of
+    gdb.disassembler.Disassembler.  ARCHITECTURE is either None or a
+    string, the name of an architecture known to GDB.
+
+    DISASSEMBLER is registered as a disassmbler for ARCHITECTURE, or
+    all architectures when ARCHITECTURE is None.
+
+    Returns the previous disassembler registered with this
+    ARCHITECTURE value.
+    """
+
+    if not isinstance(disassembler, Disassembler) and disassembler is not None:
+        raise TypeError("disassembler should sub-class gdb.disassembler.Disassembler")
+
+    old = None
+    if architecture in _disassembly_registry:
+        old = _disassembly_registry[architecture]
+        del _disassembly_registry[architecture]
+    if disassembler is not None:
+        _disassembly_registry[architecture] = disassembler
+
+    global _print_insn
+    if len(_disassembly_registry) > 0:
+        _print_insn = _perform_disassembly
+    else:
+        _print_insn = None
+
+    return old
+
+
+def _lookup_disassembler(arch):
+    try:
+        name = arch.name()
+        if name is None:
+            return None
+        if name in _disassembly_registry:
+            return _disassembly_registry[name]
+        if None in _disassembly_registry:
+            return _disassembly_registry[None]
+        return None
+    except:
+        return None
+
+
+def _perform_disassembly(info):
+    disassembler = _lookup_disassembler(info.architecture)
+    if disassembler is None:
+        return None
+    return disassembler(info)
+
+
+class StyleDisassembly(gdb.Parameter):
+    def __init__(self):
+        super(StyleDisassembly, self).__init__(
+            "style disassembly", gdb.COMMAND_NONE, gdb.PARAM_BOOLEAN
+        )
+        self.value = True
+        self._pygments_module_available = True
+
+    def get_show_string(self, sval):
+        return 'Disassembly styling is "%s".' % sval
+
+    def get_set_string(self):
+        if not self._pygments_module_available and self.value:
+            self.value = False
+            return "Python pygments module is not available"
+        return ""
+
+    def failed_to_load_pygments(self):
+        self.value = False
+        self._pygments_module_available = False
+
+    def __bool__(self):
+        return self.value
+
+    def __nonzero__(self):
+        if self.value:
+            return 1
+        else:
+            return 0
+
+
+style_disassembly_param = StyleDisassembly()
+
+try:
+    from pygments import formatters, lexers, highlight
+
+    _lexer = lexers.get_lexer_by_name("asm")
+    _formatter = formatters.TerminalFormatter()
+
+    def syntax_highlight(info):
+        # If we should not be performing syntax highlighting, or if
+        # INFO does not hold a result, then there's nothing to do.
+        if (
+            not gdb.parameter("style enabled")
+            or not style_disassembly_param
+            or not info.can_emit_style_escape
+            or info.string is None
+        ):
+            return
+        # Now apply the highlighting, and update the result.
+        str = highlight(info.string, _lexer, _formatter)
+        info.set_result(info.length, str.strip())
+
+    class _SyntaxHighlightingDisassembler(Disassembler):
+        """A syntax highlighting disassembler."""
+
+        def __init__(self, name):
+            """Constructor."""
+            super(_SyntaxHighlightingDisassembler, self).__init__(name)
+
+        def __call__(self, info):
+            """Invoke the builtin disassembler, and syntax highlight the result."""
+            gdb.disassembler.builtin_disassemble(info)
+            gdb.disassembler.syntax_highlight(info)
+
+    register_disassembler(
+        _SyntaxHighlightingDisassembler("syntax_highlighting_disassembler")
+    )
+
+except:
+
+    # Update the 'set/show style disassembly' parameter now we know
+    # that the pygments module can't be loaded.
+    style_disassembly_param.failed_to_load_pygments()
+
+    def syntax_highlight(info):
+        # An implementation of syntax_highlight that can safely be
+        # called event when syntax highlighting is not available.
+        # This just returns, leaving INFO unmodified.
+        return
diff --git a/gdb/python/py-arch.c b/gdb/python/py-arch.c
index 3e7970ab764..1855f3daab3 100644
--- a/gdb/python/py-arch.c
+++ b/gdb/python/py-arch.c
@@ -72,6 +72,15 @@ arch_object_to_gdbarch (PyObject *obj)
   return py_arch->gdbarch;
 }
 
+/* See python-internal.h.  */
+
+bool
+gdbpy_is_arch_object (PyObject *obj)
+{
+  gdb_assert (obj != nullptr);
+  return PyObject_TypeCheck (obj, &arch_object_type);
+}
+
 /* Returns the Python architecture object corresponding to GDBARCH.
    Returns a new reference to the arch_object associated as data with
    GDBARCH.  */
diff --git a/gdb/python/py-disasm.c b/gdb/python/py-disasm.c
new file mode 100644
index 00000000000..3327e532270
--- /dev/null
+++ b/gdb/python/py-disasm.c
@@ -0,0 +1,905 @@
+/* Python interface to instruction disassembly.
+
+   Copyright (C) 2008-2021 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "defs.h"
+#include "python-internal.h"
+#include "dis-asm.h"
+#include "arch-utils.h"
+#include "charset.h"
+#include "disasm.h"
+
+/* Implement gdb.disassembler.DisassembleInfo type.  An object of this type
+   represents a single disassembler request from GDB.  */
+
+struct disasm_info_object {
+  PyObject_HEAD
+
+  /* The architecture in which we are disassembling.  */
+  struct gdbarch *gdbarch;
+
+  /* Address of the instruction to disassemble.  */
+  bfd_vma address;
+
+  disassemble_info *gdb_info;
+  disassemble_info *py_info;
+
+  /* The length of the disassembled instruction, a value of -1 indicates
+     that there is no disassembly result set, otherwise, this should be a
+     value greater than zero.  */
+  int length;
+
+  /* A string buffer containing the disassembled instruction.  This is
+     initially nullptr, and is allocated when needed.  It is possible that
+     the length field (above) can be -1, but this buffer is still
+     allocated, this happens if the user first sets a result, and then
+     marks a memory error.  In this case any value in CONTENT should be
+     ignored.  */
+  string_file *content;
+
+  /* When the user indicates that a memory error has occurred then this
+     field is set to true, it is false by default.  */
+  bool memory_error_address_p;
+
+  /* When the user indicates that a memory error has occurred then the
+     address of the memory error is stored in here.  This field is only
+     valid when MEMORY_ERROR_ADDRESS_P is true, otherwise this field is
+     undefined.  */
+  CORE_ADDR memory_error_address;
+
+  /* When the user calls the builtin_disassembler function, if they pass a
+     memory source object then a pointer to the object is placed in here,
+     otherwise, this field is nullptr.  */
+  PyObject *memory_source;
+};
+
+extern PyTypeObject disasm_info_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_info_object");
+
+typedef int (*read_memory_ftype)
+    (bfd_vma memaddr, bfd_byte *myaddr, unsigned int length,
+     struct disassemble_info *dinfo);
+
+/* A sub-class of gdb_disassembler that holds a pointer to a Python
+   DisassembleInfo object.  A pointer to an instance of this class is
+   placed in the application_data field of the disassemble_info that is
+   used when we call gdbarch_print_insn.  */
+
+struct gdbpy_disassembler : public gdb_disassembler
+{
+  /* Constructor.  */
+  gdbpy_disassembler (struct gdbarch *gdbarch, struct ui_file *stream,
+		      disasm_info_object *obj);
+
+  /* Get the DisassembleInfo object pointer.  */
+  disasm_info_object *
+  py_disasm_info () const
+  {
+    return m_disasm_info_object;
+  }
+
+  /* Mark this class as a friend so that it can call the disasm_info
+     method, which is protected in our parent.  */
+  friend class scoped_disasm_info_object;
+
+private:
+  /* The DisassembleInfo object we are disassembling for.  */
+  disasm_info_object *m_disasm_info_object;
+};
+
+/* Return true if OBJ is still valid, otherwise, return false.  A valid OBJ
+   will have a non-nullptr gdb_info field.  */
+
+static bool
+disasmpy_info_is_valid (disasm_info_object *obj)
+{
+  if (obj->gdb_info == nullptr)
+    gdb_assert (obj->py_info == nullptr);
+  else
+    gdb_assert (obj->py_info != nullptr);
+
+  return obj->gdb_info != nullptr;
+}
+
+/* Ensure that a gdb.disassembler.DisassembleInfo is valid.  */
+#define DISASMPY_DISASM_INFO_REQUIRE_VALID(Info)			\
+  do {									\
+    if (!disasmpy_info_is_valid (Info))					\
+      {									\
+	PyErr_SetString (PyExc_RuntimeError,				\
+			 _("DisassembleInfo is no longer valid."));	\
+	return nullptr;							\
+      }									\
+  } while (0)
+
+/* Mark OBJ as having a memory error at ADDR.  Only the first memory error
+   is recorded, so if OBJ has already had a memory error set then this
+   call will have no effect.  */
+
+static void
+disasmpy_set_memory_error (disasm_info_object *obj, CORE_ADDR addr)
+{
+  if (!obj->memory_error_address_p)
+    {
+      obj->memory_error_address = addr;
+      obj->memory_error_address_p = true;
+    }
+}
+
+/* Clear any memory error already set on OBJ.  If there is no memory error
+   set on OBJ then this call has no effect.  */
+
+static void
+disasmpy_clear_memory_error (disasm_info_object *obj)
+{
+  obj->memory_error_address_p = false;
+}
+
+/* Clear any previous disassembler result stored within OBJ.  If there was
+   no previous disassembler result then calling this function has no
+   effect.  */
+
+static void
+disasmpy_clear_disassembler_result (disasm_info_object *obj)
+{
+  obj->length = -1;
+  gdb_assert (obj->content != nullptr);
+  obj->content->clear ();
+}
+
+/* Implement gdb.disassembler.builtin_disassemble().  Calls back into GDB's
+   builtin disassembler.  The first argument is a DisassembleInfo object
+   describing what to disassemble.  The second argument is optional and
+   provides a mechanism to modify the memory contents that the builtin
+   disassembler will actually disassemble.  Returns the Python None value.  */
+
+static PyObject *
+disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *info_obj, *memory_source_obj = nullptr;
+  static const char *keywords[] = { "info", "memory_source", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
+					&disasm_info_object_type, &info_obj,
+					&memory_source_obj))
+    return nullptr;
+
+  disasm_info_object *disasm_info = (disasm_info_object *) info_obj;
+  if (!disasmpy_info_is_valid (disasm_info))
+    {
+      PyErr_SetString (PyExc_RuntimeError,
+		       _("DisassembleInfo is no longer valid."));
+      return nullptr;
+    }
+
+  gdb::optional<scoped_restore_tmpl<PyObject *>> restore_memory_source;
+
+  disassemble_info *info = disasm_info->py_info;
+  if (memory_source_obj != nullptr)
+    {
+      if (!PyObject_HasAttrString (memory_source_obj, "read_memory"))
+	{
+	  PyErr_SetString (PyExc_TypeError,
+			   _("memory_source doesn't have a read_memory method"));
+	  return nullptr;
+	}
+
+      gdb_assert (disasm_info->memory_source == nullptr);
+      restore_memory_source.emplace (&disasm_info->memory_source,
+				     memory_source_obj);
+    }
+
+  /* When the user calls the builtin disassembler any previous result or
+     memory error is discarded, and we start fresh.  */
+  disasmpy_clear_disassembler_result (disasm_info);
+  disasmpy_clear_memory_error (disasm_info);
+
+  /* Now actually perform the disassembly.  */
+  disasm_info->length
+    = gdbarch_print_insn (disasm_info->gdbarch, disasm_info->address, info);
+
+  if (disasm_info->length == -1)
+    {
+      /* In an ideal world, every disassembler should always call the
+	 memory error function before returning a status of -1 as the only
+	 error a disassembler should encounter is a failure to read
+	 memory.  Unfortunately, there are some disassemblers who don't
+	 follow this rule, and will return -1 without calling the memory
+	 error function.
+
+	 To make the Python API simpler, we just classify everything as a
+	 memory error, but the message has to be modified for the case
+	 where the disassembler didn't call the memory error function.  */
+      if (disasm_info->memory_error_address_p)
+	{
+	  CORE_ADDR addr = disasm_info->memory_error_address;
+	  PyErr_Format (gdbpy_gdb_memory_error,
+			"failed to read memory at %s",
+			core_addr_to_string (addr));
+	}
+      else
+	PyErr_Format (gdbpy_gdb_memory_error, "failed to read memory");
+      return nullptr;
+    }
+
+  /* Instructions are either non-zero in length, or we got an error,
+     indicated by a length of -1, which we handled above.  */
+  gdb_assert (disasm_info->length > 0);
+
+  /* We should not have seen a memory error in this case.  */
+  gdb_assert (!disasm_info->memory_error_address_p);
+
+  Py_RETURN_NONE;
+}
+
+/* Implement DisassembleInfo.read_memory(LENGTH, OFFSET).  Read LENGTH
+   bytes at OFFSET from the start of the instruction currently being
+   disassembled, and return a memory buffer containing the bytes.
+
+   OFFSET defaults to zero if it is not provided.  LENGTH is required.  If
+   the read fails then this will raise a gdb.MemoryError exception.  */
+
+static PyObject *
+disasmpy_info_read_memory (PyObject *self, PyObject *args, PyObject *kw)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+
+  LONGEST length, offset = 0;
+  gdb::unique_xmalloc_ptr<gdb_byte> buffer;
+  static const char *keywords[] = { "length", "offset", nullptr };
+
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "L|L", keywords,
+					&length, &offset))
+    return nullptr;
+
+  /* The apparent address from which we are reading memory.  Note that in
+     some cases GDB actually disassembles instructions from a buffer, so
+     we might not actually be reading this information directly from the
+     inferior memory.  This is all hidden behind the read_memory_func API
+     within the disassemble_info structure.  */
+  CORE_ADDR address = obj->address + offset;
+
+  /* Setup a buffer to hold the result.  */
+  buffer.reset ((gdb_byte *) xmalloc (length));
+
+  /* Read content into BUFFER.  If the read fails then raise a memory
+     error, otherwise, convert BUFFER to a Python memory buffer, and return
+     it to the user.  */
+  disassemble_info *info = obj->gdb_info;
+  if (info->read_memory_func ((bfd_vma) address, buffer.get (),
+			      (unsigned int) length, info) != 0)
+    {
+      PyErr_Format (gdbpy_gdb_memory_error,
+		    "failed to read %s bytes at %s",
+		    pulongest ((ULONGEST) length),
+		    core_addr_to_string (address));
+      return nullptr;
+    }
+  return gdbpy_buffer_to_membuf (std::move (buffer), address, length);
+}
+
+/* Implement DisassembleInfo.set_result(LENGTH, STRING).  Discard any
+   previous memory error and set the result of this disassembly to be
+   STRING, a LENGTH bytes long instruction.  The LENGTH must be greater
+   than zero otherwise a ValueError exception is raised.  STRING must be a
+   non-empty string, or a ValueError exception is raised.  */
+
+static PyObject *
+disasmpy_info_set_result (PyObject *self, PyObject *args, PyObject *kw)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+
+  static const char *keywords[] = { "length", "string", nullptr };
+  int length;
+  const char *string;
+
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "is", keywords,
+					&length, &string))
+    return nullptr;
+
+  if (length <= 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("Length must be greater than 0."));
+      return nullptr;
+    }
+
+  size_t string_len = strlen (string);
+  if (string_len == 0)
+    {
+      PyErr_SetString (PyExc_ValueError, _("String must not be empty."));
+      return nullptr;
+    }
+
+  /* Discard any previously recorded memory error, and any previous
+     disassembler result.  */
+  disasmpy_clear_memory_error (obj);
+  disasmpy_clear_disassembler_result (obj);
+
+  /* And set the result.  */
+  obj->length = length;
+  gdb_assert (obj->content != nullptr);
+  obj->content->write (string, string_len);
+
+  Py_RETURN_NONE;
+}
+
+/* Implement DisassembleInfo.memory_error().  Mark SELF (a DisassembleInfo
+   object) as having a memory error.  Any previous result is discarded.  */
+
+static PyObject *
+disasmpy_info_memory_error (PyObject *self, PyObject *args, PyObject *kw)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+
+  static const char *keywords[] = { "offset", nullptr };
+  LONGEST offset;
+
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "L", keywords,
+					&offset))
+    return nullptr;
+
+  /* Discard any previous disassembler result, and mark OBJ as having a
+     memory error.  */
+  disasmpy_clear_disassembler_result (obj);
+  disasmpy_set_memory_error (obj, obj->address + offset);
+
+  Py_RETURN_NONE;
+}
+
+/* Implement gdb.disassembler.format_address(ARCH, ADDR).  Formats ADDR, an
+   address and returns a string.  ADDR will be formatted in the style that
+   the disassembler uses: '0x.... <symbol + offset>'.  ARCH is a
+   gdb.Architecture used to perform the formatting.  */
+
+static PyObject *
+disasmpy_format_address (PyObject *self, PyObject *args, PyObject *kw)
+{
+  static const char *keywords[] = { "architecture", "address", nullptr };
+  PyObject *addr_obj, *arch_obj;
+  CORE_ADDR addr;
+
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "OO", keywords,
+					&arch_obj, &addr_obj))
+    return nullptr;
+
+  if (get_addr_from_python (addr_obj, &addr) < 0)
+    return nullptr;
+
+  if (!gdbpy_is_arch_object (arch_obj))
+    {
+      PyErr_SetString (PyExc_TypeError,
+		       _("architecture argument is not a gdb.Architecture"));
+      return nullptr;
+    }
+
+  gdbarch *gdbarch = arch_object_to_gdbarch (arch_obj);
+  if (gdbarch == nullptr)
+    {
+      PyErr_SetString (PyExc_RuntimeError,
+		       _("architecture argument is invalid."));
+      return nullptr;
+    }
+
+  string_file buf;
+  print_address (gdbarch, addr, &buf);
+  return PyString_FromString (buf.c_str ());
+}
+
+/* Implement DisassembleInfo.address attribute, return the address at which
+   GDB would like an instruction disassembled.  */
+
+static PyObject *
+disasmpy_info_address (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdb_py_object_from_longest (obj->address).release ();
+}
+
+/* Implement DisassembleInfo.string attribute.  Return a string containing
+   the current disassembly result, or None if there is no current
+   disassembly result.  */
+
+static PyObject *
+disasmpy_info_string (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+
+  gdb_assert (obj->content != nullptr);
+  if (strlen (obj->content->c_str ()) == 0)
+    Py_RETURN_NONE;
+  gdb_assert (obj->length > 0);
+  return PyUnicode_Decode (obj->content->c_str (),
+			   obj->content->size (),
+			   host_charset (), nullptr);
+}
+
+/* Implement DisassembleInfo.length attribute.  Return the length of the
+   current disassembled instruction, as set by a call to
+   DisassembleInfo.set_result.  If no result has been set yet, or if a call
+   to DisassembleInfo.memory_error has invalidated the result, then None is
+   returned.  */
+
+static PyObject *
+disasmpy_info_length (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  if (obj->length == -1)
+    Py_RETURN_NONE;
+  gdb_assert (obj->length > 0);
+  gdb_assert (obj->content != nullptr);
+  gdb_assert (strlen (obj->content->c_str ()) > 0);
+  return gdb_py_object_from_longest (obj->length).release ();
+}
+
+/* Implement DisassembleInfo.architecture attribute.  Return the
+   gdb.Architecture in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_architecture (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdbarch_to_arch_object (obj->gdbarch);
+}
+
+/* Implement DisassembleInfo.can_emit_style_escape attribute.  Returns True
+   if the output stream that the disassembly result will be written too
+   supports style escapes, otherwise, returns False.  */
+
+static PyObject *
+disasmpy_info_can_emit_style_escape (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  bool can_emit_style_escape = current_uiout->can_emit_style_escape ();
+  return PyBool_FromLong (can_emit_style_escape ? 1 : 0);
+}
+
+/* This implements the disassemble_info read_memory_func callback.  This
+   will either call the standard read memory function, or, if the user has
+   supplied a memory source (see disasmpy_builtin_disassemble) then this
+   will call back into Python to obtain the memory contents.
+
+   Read LEN bytes from MEMADDR and place them into BUFF.  Return 0 on
+   success (in which case BUFF has been filled), or -1 on error, in which
+   case the contents of BUFF are undefined.  */
+
+static int
+disasmpy_read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+			  unsigned int len, struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  disasm_info_object *obj = dis->py_disasm_info ();
+
+  /* The simple case, the user didn't pass a separate memory source, so we
+     just delegate to the standard disassemble_info read_memory_func.  */
+  if (obj->memory_source == nullptr)
+    return obj->gdb_info->read_memory_func (memaddr, buff, len, obj->gdb_info);
+
+  /* The user provided a separate memory source, we need to call the
+     read_memory method on the memory source and use the buffer it returns
+     as the bytes of memory.  */
+  PyObject *memory_source = obj->memory_source;
+  LONGEST offset = (LONGEST) memaddr - (LONGEST) obj->address;
+  gdbpy_ref<> result_obj (PyObject_CallMethod (memory_source, "read_memory",
+					       "KL", len, offset));
+  if (result_obj == nullptr)
+    {
+      /* If we got a gdb.MemoryError then we ignore this and just report
+	 that the read failed to the caller.  For any other exception type
+	 we assume this is a bug in the users code, print the stack, and
+	 then report the read failed.  */
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	PyErr_Clear ();
+      else
+	gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Convert the result to a buffer.  */
+  Py_buffer py_buff;
+  if (!PyObject_CheckBuffer (result_obj.get ())
+      || PyObject_GetBuffer (result_obj.get(), &py_buff, PyBUF_CONTIG_RO) < 0)
+    {
+      PyErr_Format (PyExc_TypeError,
+		    _("Result from read_memory is not a buffer"));
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Wrap PY_BUFF so that it is cleaned up correctly at the end of this
+     scope.  */
+  Py_buffer_up buffer_up (&py_buff);
+
+  /* Validate that the buffer is the correct length.  */
+  if (py_buff.len != len)
+    {
+      PyErr_Format (PyExc_ValueError,
+		    _("Result from read_memory is incorrectly sized buffer"));
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Copy the data out of the Python buffer and return succsess.*/
+  const gdb_byte *buffer = (const gdb_byte *) py_buff.buf;
+  memcpy (buff, buffer, len);
+  return 0;
+}
+
+/* Implement memory_error_func callback for disassemble_info.  Extract the
+   underlying DisassembleInfo Python object, and set a memory error on
+   it.  */
+
+static void
+disasmpy_memory_error_func (int status, bfd_vma memaddr,
+			   struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  disasm_info_object *obj = dis->py_disasm_info ();
+  disasmpy_set_memory_error (obj, memaddr);
+}
+
+/* Constructor.  */
+
+gdbpy_disassembler::gdbpy_disassembler (struct gdbarch *gdbarch,
+					struct ui_file *stream,
+					disasm_info_object *obj)
+  : gdb_disassembler (gdbarch, stream, disasmpy_read_memory_func,
+		      disasmpy_memory_error_func),
+    m_disasm_info_object (obj)
+{ /* Nothing.  */ }
+
+/* A wrapper around a reference to a Python DisassembleInfo object, along
+   with some supporting information that the DisassembleInfo object needs
+   to reference.
+
+   Each DisassembleInfo is created in gdbpy_print_insn, and is done with by
+   the time that function returns.  However, there's nothing to stop a user
+   caching a reference to the DisassembleInfo, and thus keeping the object
+   around.
+
+   We therefore have the notion of a DisassembleInfo becoming invalid, this
+   happens when gdbpy_print_insn returns.  This class is responsible for
+   marking the DisassembleInfo as invalid in its destructor.  */
+
+struct scoped_disasm_info_object
+{
+  /* Constructor.  */
+  scoped_disasm_info_object (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+			 disassemble_info *info)
+    : m_disasm_info (allocate_disasm_info_object ()),
+      m_py_disassembler (gdbarch, &m_string_file, m_disasm_info.get ())
+  {
+    m_disasm_info->address = memaddr;
+    m_disasm_info->gdb_info = info;
+    m_disasm_info->py_info = m_py_disassembler.disasm_info ();
+    m_disasm_info->length = -1;
+    m_disasm_info->content = &m_string_file;
+    m_disasm_info->gdbarch = gdbarch;
+    m_disasm_info->memory_error_address_p = false;
+    m_disasm_info->memory_error_address = 0;
+    m_disasm_info->memory_source = nullptr;
+  }
+
+  /* Upon destruction clear pointers to state that will no longer be
+     valid.  These fields are checked in disasmpy_info_is_valid to see if
+     the disasm_info_object is still valid or not.  */
+  ~scoped_disasm_info_object ()
+  {
+    m_disasm_info->gdb_info = nullptr;
+    m_disasm_info->py_info = nullptr;
+    m_disasm_info->content = nullptr;
+  }
+
+  /* Return a pointer to the underlying disasm_info_object instance.  */
+  disasm_info_object *
+  get () const
+  {
+    return m_disasm_info.get ();
+  }
+
+private:
+
+  /* Wrapper around the call to PyObject_New, this wrapper function can be
+     called from the constructor initialization list, while PyObject_New, a
+     macro, can't.  */
+  static disasm_info_object *
+  allocate_disasm_info_object ()
+  {
+    return (disasm_info_object *) PyObject_New (disasm_info_object,
+						&disasm_info_object_type);
+  }
+
+  /* A reference to a gdb.disassembler.DisassembleInfo object.  When this
+     containing instance goes out of scope this reference is released,
+     however, the user might be holding other references to the
+     DisassembleInfo object in Python code, so the underlying object might
+     not be deleted.  */
+  gdbpy_ref<disasm_info_object> m_disasm_info;
+
+  /* A location into which the output of the Python disassembler is
+     collected.  We only send this back to GDB once the Python disassembler
+     has completed successfully.  */
+  string_file m_string_file;
+
+  /* Core GDB requires that the disassemble_info application_data field be
+     an instance of, or a sub-class or, gdb_disassembler.  We use a
+     sub-class so that functions within the file can obtain a pointer to
+     the disasm_info_object from the application_data.  */
+  gdbpy_disassembler m_py_disassembler;
+};
+
+/* See python-internal.h.  */
+
+gdb::optional<int>
+gdbpy_print_insn (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+		  disassemble_info *info)
+{
+  if (!gdb_python_initialized)
+    return {};
+
+  gdbpy_enter enter_py (get_current_arch (), current_language);
+
+  /* The attribute we are going to lookup that provides the print_insn
+     functionality.  */
+  static const char *callback_name = "_print_insn";
+
+  /* Grab a reference to the gdb.disassembler module, and check it has the
+     attribute that we need.  */
+  static gdbpy_ref<> gdb_python_disassembler_module
+    (PyImport_ImportModule ("gdb.disassembler"));
+  if (gdb_python_disassembler_module == nullptr
+      || !PyObject_HasAttrString (gdb_python_disassembler_module.get (),
+				  callback_name))
+    return {};
+
+  /* Now grab the callback attribute from the module, and check that it is
+     callable.  */
+  gdbpy_ref<> hook
+    (PyObject_GetAttrString (gdb_python_disassembler_module.get (),
+			     callback_name));
+  if (hook == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+  if (!PyCallable_Check (hook.get ()))
+    return {};
+
+  scoped_disasm_info_object scoped_disasm_info (gdbarch, memaddr, info);
+  disasm_info_object *disasm_info = scoped_disasm_info.get ();
+
+  /* Call into the registered disassembler to (possibly) perform the
+     disassembly.  */
+  PyObject *insn_disas_obj = (PyObject *) disasm_info;
+  gdbpy_ref<> result (PyObject_CallFunctionObjArgs (hook.get (),
+						    insn_disas_obj,
+						    nullptr));
+
+  if (result == nullptr)
+    {
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	{
+	  /* Uncaught memory errors are not printed, we assume that the
+	     user tried to read some bytes for their custom disassembler,
+	     but the bytes were no available, as such, we should silently
+	     fall back to using the builtin disassembler, which is what
+	     happens when we return no value here.  */
+	  PyErr_Clear ();
+	}
+      else
+	{
+	  /* Any other error while executing the _print_insn callback
+	     should result in a debug stack being printed, then we return
+	     no value to indicate that the builtin disassembler should be
+	     used.  */
+	  gdbpy_print_stack ();
+	}
+      return {};
+    }
+  else if (result != Py_None)
+    error (_("invalid return value from gdb.disassembler._print_insn"));
+
+  if (disasm_info->memory_error_address_p)
+    {
+      /* We pass -1 for the status here.  GDB doesn't make use of this
+	 field, but disassemblers usually pass the result of
+	 read_memory_func as the status, in which case -1 indicates an
+	 error.  */
+      bfd_vma addr = disasm_info->memory_error_address;
+      info->memory_error_func (-1, addr, info);
+      return gdb::optional<int> (-1);
+    }
+
+  /* If the gdb.disassembler.DisassembleInfo object doesn't have a result
+     then return false.  */
+  if (disasm_info->length == -1)
+    return {};
+
+  /* Print the content from the DisassembleInfo back through to GDB's
+     standard fprintf_func handler.  */
+  info->fprintf_func (info->stream, "%s", disasm_info->content->c_str ());
+
+  /* Return the length of this instruction.  */
+  return gdb::optional<int> (disasm_info->length);
+}
+
+/* The tp_dealloc callback for the DisassembleInfo type.  Takes care of
+   deallocating the content buffer.  */
+
+static void
+disasmpy_dealloc (PyObject *self)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+
+  /* The memory_source field is only ever temporarily set to non-nullptr
+     during the disasmpy_builtin_disassemble function.  By the end of that
+     function the memory_source field should be back to nullptr.  */
+  gdb_assert (obj->memory_source == nullptr);
+
+  /* The content field will also be reset to nullptr by the end of
+     gdbpy_print_insn, so the following assert should hold.  */
+  gdb_assert (obj->content == nullptr);
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* The get/set attributes of the gdb.disassembler.DisassembleInfo type.  */
+
+static gdb_PyGetSetDef disasm_info_object_getset[] = {
+  { "address", disasmpy_info_address, nullptr,
+    "Start address of the instruction to disassemble.", nullptr },
+  { "string", disasmpy_info_string, nullptr,
+    "String representing the disassembled instruction.", nullptr },
+  { "length", disasmpy_info_length, nullptr,
+    "Length in octets of the disassembled instruction.", nullptr },
+  { "architecture", disasmpy_info_architecture, nullptr,
+    "Architecture to disassemble in", nullptr },
+  { "can_emit_style_escape", disasmpy_info_can_emit_style_escape, nullptr,
+    "Boolean indicating if style escapes can be emitted", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* The methods of the gdb.disassembler.DisassembleInfo type.  */
+
+static PyMethodDef disasm_info_object_methods[] = {
+  { "read_memory", (PyCFunction) disasmpy_info_read_memory,
+    METH_VARARGS | METH_KEYWORDS,
+    "read_memory (LEN, OFFSET = 0) -> Octets[]\n\
+Read LEN octets for the instruction to disassemble." },
+  { "set_result", (PyCFunction) disasmpy_info_set_result,
+    METH_VARARGS | METH_KEYWORDS,
+    "set_result (LENGTH, STRING) -> None\n\
+Set the disassembly result, LEN in octets, and disassembly STRING." },
+  { "memory_error", (PyCFunction) disasmpy_info_memory_error,
+    METH_VARARGS | METH_KEYWORDS,
+    "memory_error (OFFSET) -> None\n\
+A memory error occurred when trying to read bytes at OFFSET." },
+  {nullptr}  /* Sentinel */
+};
+
+/* These are the methods we add into the _gdb.disassembler module, which
+   are then imported into the gdb.disassembler module.  These are global
+   functions that support performing disassembly.  */
+
+PyMethodDef python_disassembler_methods[] =
+{
+  { "format_address", (PyCFunction) disasmpy_format_address,
+    METH_VARARGS | METH_KEYWORDS,
+    "format_address (ARCHITECTURE, ADDRESS) -> String.\n\
+Format ADDRESS as a string suitable for use in disassembler output." },
+  { "builtin_disassemble", (PyCFunction) disasmpy_builtin_disassemble,
+    METH_VARARGS | METH_KEYWORDS,
+    "builtin_disassemble (INFO, MEMORY_SOURCE = None) -> None\n\
+Disassemble using GDB's builtin disassembler.  INFO is an instance of\n\
+gdb.disassembler.DisassembleInfo.  The MEMORY_SOURCE, if not None, should\n\
+be an object with the read_memory method." },
+  {nullptr, nullptr, 0, nullptr}
+};
+
+#ifdef IS_PY3K
+/* Structure to define the _gdb.disassembler module.  */
+
+static struct PyModuleDef python_disassembler_module_def =
+{
+  PyModuleDef_HEAD_INIT,
+  "_gdb.disassembler",
+  nullptr,
+  -1,
+  python_disassembler_methods,
+  nullptr,
+  nullptr,
+  nullptr,
+  nullptr
+};
+#endif
+
+/* Called to initialize the Python structures in this file.  */
+
+int
+gdbpy_initialize_disasm (void)
+{
+  /* Create the _gdb.disassembler module, and add it to the _gdb module.  */
+
+  PyObject *gdb_disassembler_module;
+#ifdef IS_PY3K
+  gdb_disassembler_module = PyModule_Create (&python_disassembler_module_def);
+#else
+  gdb_disassembler_module = Py_InitModule ("_gdb.disassembler",
+					   python_disassembler_methods);
+#endif
+  if (gdb_disassembler_module == nullptr)
+    return -1;
+  PyModule_AddObject(gdb_module, "disassembler", gdb_disassembler_module);
+
+  /* This is needed so that 'import _gdb.disassembler' will work.  */
+  PyObject *dict = PyImport_GetModuleDict ();
+  PyDict_SetItemString (dict, "_gdb.disassembler", gdb_disassembler_module);
+
+  /* Having the tp_new field as nullptr means that this class can't be
+     created from user code.  The only way they can be created is from
+     within GDB, and then they are passed into user code.  */
+  gdb_assert (disasm_info_object_type.tp_new == nullptr);
+  if (PyType_Ready (&disasm_info_object_type) < 0)
+    return -1;
+
+  return gdb_pymodule_addobject (gdb_disassembler_module, "DisassembleInfo",
+				 (PyObject *) &disasm_info_object_type);
+}
+
+/* Describe the gdb.disassembler.DisassembleInfo type.  */
+
+PyTypeObject disasm_info_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassembleInfo",		/*tp_name*/
+  sizeof (disasm_info_object),			/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  disasmpy_dealloc,                		/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT,				/*tp_flags*/
+  "GDB instruction disassembler object",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  disasm_info_object_methods,			/* tp_methods */
+  0,						/* tp_members */
+  disasm_info_object_getset			/* tp_getset */
+};
diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
index 735328b49c4..d0330c81079 100644
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@@ -497,6 +497,8 @@ int gdbpy_initialize_auto_load (void)
   CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 int gdbpy_initialize_values (void)
   CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
+int gdbpy_initialize_disasm (void)
+  CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 int gdbpy_initialize_frames (void)
   CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 int gdbpy_initialize_instruction (void)
@@ -798,4 +800,23 @@ typedef std::unique_ptr<Py_buffer, Py_buffer_deleter> Py_buffer_up;
 extern bool gdbpy_parse_register_id (struct gdbarch *gdbarch,
 				     PyObject *pyo_reg_id, int *reg_num);
 
+/* Implement the 'print_insn' hook for Python.  Disassemble an instruction
+   whose address is ADDRESS for architecture GDBARCH.  The bytes of the
+   instruction should be read with INFO->read_memory_func as the
+   instruction being disassembled might actually be in a buffer.
+
+   Used INFO->fprintf_func to print the results of the disassembly, and
+   return the length of the instruction in octets.
+
+   If no instruction can be disassembled then return an empty value.  */
+
+extern gdb::optional<int> gdbpy_print_insn (struct gdbarch *gdbarch,
+					    CORE_ADDR address,
+					    disassemble_info *info);
+
+/* Return true if OBJ is a gdb.Architecture object, otherwise, return
+   false.  */
+
+bool gdbpy_is_arch_object (PyObject *obj);
+
 #endif /* PYTHON_PYTHON_INTERNAL_H */
diff --git a/gdb/python/python.c b/gdb/python/python.c
index d817bd5bf27..3aba565cd11 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -190,7 +190,7 @@ const struct extension_language_ops python_extension_ops =
 
   gdbpy_colorize,
 
-  NULL, /* gdbpy_print_insn, */
+  gdbpy_print_insn,
 };
 
 /* Architecture and language to be used in callbacks from
@@ -1852,6 +1852,7 @@ do_start_initialization ()
 
   if (gdbpy_initialize_auto_load () < 0
       || gdbpy_initialize_values () < 0
+      || gdbpy_initialize_disasm () < 0
       || gdbpy_initialize_frames () < 0
       || gdbpy_initialize_commands () < 0
       || gdbpy_initialize_instruction () < 0
@@ -2130,6 +2131,14 @@ do_initialize (const struct extension_language_defn *extlang)
       return true;
     }
 
+  /* Import gdb.disassembler now.  The disassembler module provides some
+     parameters that we want to be available to users from the moment GDB
+     starts up.  */
+  PyObject *gdb_disassembler_module
+    = PyImport_ImportModule ("gdb.disassembler");
+  if (gdb_disassembler_module == nullptr)
+    gdbpy_print_stack ();
+
   return gdb_pymodule_addobject (m, "gdb", gdb_python_module) >= 0;
 }
 
diff --git a/gdb/testsuite/gdb.base/style.exp b/gdb/testsuite/gdb.base/style.exp
index 91d3059612d..7aa51cdfe00 100644
--- a/gdb/testsuite/gdb.base/style.exp
+++ b/gdb/testsuite/gdb.base/style.exp
@@ -182,12 +182,26 @@ proc run_style_tests { } {
 
 	gdb_test_no_output "set width 0"
 
-	set main [limited_style main function]
-	set func [limited_style some_called_function function]
-	# Somewhere should see the call to the function.
-	gdb_test "disassemble main" \
-	    [concat "Dump of assembler code for function $main:.*" \
-		 "[limited_style $hex address].*$func.*"]
+	# Disassembly highlighting is done by Python, so, if the
+	# required modules are not available we'll not get the full
+	# highlighting.
+	if { $::python_disassembly_highlighting } {
+	    # Check that the header line of the disassembly output is
+	    # styled correctly, the address at the start of the first
+	    # disassembly line is styled correctly, and that there is at
+	    # least one escape sequence in the disassembly output.
+	    set main [limited_style main function]
+	    gdb_test "disassemble main" \
+		[concat "Dump of assembler code for function $main:\\r\\n" \
+		     "\\s+[limited_style $hex address]\\s+<\\+$decimal>:\[^\\r\\n\]+\033\\\[${decimal}\[^\\r\\n\]+.*" ""]
+	} else {
+	    set main [limited_style main function]
+	    set func [limited_style some_called_function function]
+	    # Somewhere should see the call to the function.
+	    gdb_test "disassemble main" \
+		[concat "Dump of assembler code for function $main:.*" \
+		     "[limited_style $hex address].*$func.*"]
+	}
 
 	set ifield [limited_style int_field variable]
 	set sfield [limited_style string_field variable]
@@ -312,6 +326,25 @@ proc test_startup_version_string { } {
     gdb_test "" "${vers}.*" "version is styled at startup"
 }
 
+# Check to see if the Python highlighting of disassembler output is
+# expected or not, this highlighting requires Python support in GDB,
+# and the Python pygments module to be available.
+clean_restart ${binfile}
+if {![skip_python_tests]} {
+    gdb_test_multiple "python import pygments" "" {
+	-re "ModuleNotFoundError: No module named 'pygments'.*$gdb_prompt $" {
+	    set python_disassembly_highlighting false
+	}
+	-re "ImportError: No module named pygments.*$gdb_prompt $" {
+	    set python_disassembly_highlighting false
+	}
+	-re "^python import pygments\r\n$gdb_prompt $" {
+	    set python_disassembly_highlighting true
+	}
+    }
+} else {
+    set python_disassembly_highlighting false
+}
 
 # Run tests with all styles in their default state.
 with_test_prefix "all styles enabled" {
diff --git a/gdb/testsuite/gdb.python/py-disasm.c b/gdb/testsuite/gdb.python/py-disasm.c
new file mode 100644
index 00000000000..1d89a49c346
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.c
@@ -0,0 +1,25 @@
+/* This test program is part of GDB, the GNU debugger.
+
+   Copyright 2021 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+int
+main ()
+{
+  asm ("nop");
+  asm ("nop");	/* Break here.  */
+  asm ("nop");
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.exp b/gdb/testsuite/gdb.python/py-disasm.exp
new file mode 100644
index 00000000000..f8d6140036d
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.exp
@@ -0,0 +1,201 @@
+# Copyright (C) 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This file is part of the GDB testsuite.  It validates the Python
+# disassembler API.
+
+load_lib gdb-python.exp
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile} "debug"] } {
+    return -1
+}
+
+# Skip all tests if Python scripting is not enabled.
+if { [skip_python_tests] } { continue }
+
+if ![runto_main] then {
+    fail "can't run to main"
+    return 0
+}
+
+set pyfile [gdb_remote_download host ${srcdir}/${subdir}/${testfile}.py]
+
+gdb_test "source ${pyfile}" "Python script imported" \
+         "import python scripts"
+
+gdb_breakpoint [gdb_get_line_number "Break here."]
+gdb_continue_to_breakpoint "Break here."
+
+set curr_pc [get_valueof "/x" "\$pc" "*unknown*"]
+
+gdb_test_no_output "python current_pc = ${curr_pc}"
+
+# The current pc will be something like 0x1234 with no leading zeros.
+# However, in the disassembler output addresses are padded with zeros.
+# This substitution changes 0x1234 to 0x0*1234, which can then be used
+# as a regexp in the disassembler output matching.
+set curr_pc_pattern [string replace ${curr_pc} 0 1 "0x0*"]
+
+# Grab the name of the current architecture, this is used in the tests
+# patterns below.
+set curr_arch [get_python_valueof "gdb.selected_inferior().architecture().name()" "*unknown*"]
+
+# Helper proc that removes all registered disassemblers.
+proc py_remove_all_disassemblers {} {
+    gdb_test_no_output "python remove_all_python_disassemblers()"
+}
+
+# A list of test plans.  Each plan is a list of two elements, the
+# first element is the name of a class in py-disasm.py, this is a
+# disassembler class.  The second element is a pattern that should be
+# matched in the disassembler output.
+#
+# Each different disassembler tests some different feature of the
+# Python disassembler API.
+set addr_pattern "\r\n=> ${curr_pc_pattern} <\[^>\]+>:\\s+"
+set base_pattern "${addr_pattern}nop"
+set test_plans \
+    [list \
+	 [list "" "${base_pattern}\r\n.*"] \
+	 [list "GlobalNullDisassembler" "${base_pattern}\r\n.*"] \
+	 [list "GlobalPreInfoDisassembler" "${base_pattern}\\s+## ad = $hex, st = None, le = None, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalPostInfoDisassembler" "${base_pattern}\\s+## ad = $hex, st = nop, le = $decimal, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalEscDisassembler" "${base_pattern}\\s+## style = False\r\n.*"] \
+	 [list "GlobalReadDisassembler" "${base_pattern}\\s+## bytes =( $hex)+\r\n.*"] \
+	 [list "GlobalAddrDisassembler" "${base_pattern}\\s+## addr = ${curr_pc_pattern} <\[^>\]+>\r\n.*"] \
+	 [list "SimpleMemoryErrorDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "NonMemoryErrorEarlyDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: error before setting a result\r\nnop\r\n.*"] \
+	 [list "NonMemoryErrorLateDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: error after setting a result\r\nnop\r\n.*"] \
+	 [list "MemoryErrorEarlyDisassembler" "${base_pattern}\r\n.*"] \
+	 [list "MemoryErrorLateDisassembler" "${base_pattern}\r\n.*"] \
+	 [list "CaughtMemoryErrorEarlyDisassembler" "${addr_pattern}Cannot access memory at address 0x2"] \
+	 [list "CaughtMemoryErrorLateDisassembler" "${addr_pattern}Cannot access memory at address 0x2"] \
+	 [list "CaughtMemoryErrorEarlyAndReplaceDisassembler" "${base_pattern}\\s+## tag = GOT MEMORY ERROR\r\n.*"] \
+	 [list "SetResultBeforeBuiltinDisassembler" "${base_pattern}\r\n.*"]]
+
+# Now execute each test plan.
+foreach plan $test_plans {
+    set global_disassembler_name [lindex $plan 0]
+    set expected_pattern [lindex $plan 1]
+
+    with_test_prefix "global_disassembler=${global_disassembler_name}" {
+	# Remove all existing disassemblers.
+	py_remove_all_disassemblers
+
+	# If we have a disassembler to load, do it now.
+	if { $global_disassembler_name != "" } {
+	    gdb_test_no_output "python add_global_disassembler($global_disassembler_name)"
+	}
+
+	# Disassemble main, and check the disassembler output.
+	gdb_test "disassemble main" $expected_pattern
+    }
+}
+
+# Check that the architecture specific disassemblers can override the
+# global disassembler.
+#
+# First, register a global disassembler, and check it is in place.
+with_test_prefix "GLOBAL tagging disassembler" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"GLOBAL\"), None)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Now register an architecture specific disassembler, and check it
+# overrides the global disassembler.
+with_test_prefix "LOCAL tagging disassembler" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"LOCAL\"), \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = LOCAL\r\n.*"
+}
+
+# Now remove the architecture specific disassembler, and check that
+# the global disassembler kicks back in.
+with_test_prefix "GLOBAL tagging disassembler again" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(None, \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Check that a DisassembleInfo becomes invalid after the call into the
+# disassembler.
+with_test_prefix "DisassembleInfo becomes invalid" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python add_global_disassembler(GlobalCachingDisassembler)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## CACHED\r\n.*"
+    gdb_test "python GlobalCachingDisassembler.check()" "PASS"
+}
+
+# Test the memory source aspect of the builtin disassembler.
+with_test_prefix "memory source api" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(analyzing_disassembler)"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*"
+    gdb_test "python analyzing_disassembler.find_replacement_candidate()" \
+	"Replace from $hex to $hex with NOP"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*" \
+	"second disassembler pass"
+    gdb_test "python analyzing_disassembler.check()" \
+	"PASS"
+}
+
+# The syntax highlighting disassembler makes use of the pygments
+# module.  Try importing the module now, if this fails then we can
+# skip the tests that check the syntax highlighting.
+gdb_test_multiple "python import pygments" "" {
+    -re "ModuleNotFoundError: No module named 'pygments'.*$gdb_prompt $" {
+	set pygments_module_available false
+    }
+    -re "ImportError: No module named pygments.*$gdb_prompt $" {
+	set pygments_module_available false
+    }
+    -re "^python import pygments\r\n$gdb_prompt $" {
+	set pygments_module_available true
+    }
+}
+
+if { $pygments_module_available } {
+    # Test the syntax highlighting disassembler.
+    with_test_prefix "syntax highlighting" {
+	py_remove_all_disassemblers
+	save_vars { env(TERM) } {
+	    # We need an ANSI-capable terminal to get the output.
+	    setenv TERM ansi
+
+	    clean_restart ${binfile}
+
+	    if ![runto_main] then {
+		fail "can't run to main"
+		return 0
+	    }
+
+	    gdb_test "source ${pyfile}" "Python script imported" \
+		"import python scripts"
+
+	    gdb_breakpoint [gdb_get_line_number "Break here."]
+	    gdb_continue_to_breakpoint "Break here."
+
+	    gdb_test_no_output "python current_pc = ${curr_pc}"
+
+	    gdb_test_no_output "python add_global_disassembler(GlobalColorDisassembler)"
+	    set styled_nop "\033\\\[\[0-9\]+(;\[0-9\]+)?mnop\033\\\[\[^m\]+m"
+	    set styled_address [style "${curr_pc_pattern}" address]
+	    gdb_test "disassemble main" "\r\n=> ${styled_address} <\[^>\]+>:\\s+${styled_nop}\r\n.*"
+	}
+    }
+} else {
+    untested "disassemble with styling"
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.py b/gdb/testsuite/gdb.python/py-disasm.py
new file mode 100644
index 00000000000..2cfcb7ceaff
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.py
@@ -0,0 +1,538 @@
+# Copyright (C) 2021 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import gdb
+import gdb.disassembler
+import struct
+import sys
+
+from gdb.disassembler import Disassembler
+
+# A global, holds the program-counter address at which we should
+# perform the extra disassembly that this script provides.
+current_pc = None
+
+
+def remove_all_python_disassemblers():
+    for a in gdb.architecture_names():
+        gdb.disassembler.register_disassembler(None, a)
+    gdb.disassembler.register_disassembler(None, None)
+
+
+class TestDisassembler(Disassembler):
+    """A base class for disassemblers within this script to inherit from.
+       Implements the __call__ method and ensures we only do any
+       disassembly wrapping for the global CURRENT_PC."""
+
+    def __init__(self):
+        global current_pc
+
+        super(TestDisassembler, self).__init__("TestDisassembler")
+        if current_pc == None:
+            raise gdb.GdbError("no current_pc set")
+
+    def __call__(self, info):
+        global current_pc
+
+        if info.address != current_pc:
+            return None
+        return self.disassemble(info)
+
+    def disassemble(self, info):
+        raise NotImplementedError("override the disassemble method")
+
+
+class GlobalPreInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo before disassembly has occurred."""
+
+    def disassemble(self, info):
+        ad = info.address
+        st = info.string
+        le = info.length
+        ar = info.architecture
+
+        if le is not None:
+            raise gdb.GdbError("invalid length")
+
+        if st is not None:
+            raise gdb.GdbError("invaild string")
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        gdb.disassembler.builtin_disassemble(info)
+
+        text = info.string + "\t## ad = 0x%x, st = %s, le = %s, ar = %s" % (
+            ad,
+            st,
+            le,
+            ar.name(),
+        )
+        info.set_result(info.length, text)
+
+
+class GlobalPostInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo after disassembly has occurred."""
+
+    def disassemble(self, info):
+        gdb.disassembler.builtin_disassemble(info)
+
+        ad = info.address
+        st = info.string
+        le = info.length
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if st is None or st == "":
+            raise gdb.GdbError("invalid string")
+
+        if le <= 0:
+            raise gdb.GdbError("invalid length")
+
+        text = info.string + "\t## ad = 0x%x, st = %s, le = %d, ar = %s" % (
+            ad,
+            st,
+            le,
+            ar.name(),
+        )
+        info.set_result(info.length, text)
+
+
+class GlobalEscDisassembler(TestDisassembler):
+    """Check the can_emit_style_escape attribute."""
+
+    def disassemble(self, info):
+        gdb.disassembler.builtin_disassemble(info)
+        text = info.string + "\t## style = %s" % info.can_emit_style_escape
+        info.set_result(info.length, text)
+
+
+class GlobalReadDisassembler(TestDisassembler):
+    """Check the DisassembleInfo.read method."""
+
+    def disassemble(self, info):
+        gdb.disassembler.builtin_disassemble(info)
+        len = info.length
+        str = ""
+        for o in range(len):
+            if str != "":
+                str += " "
+            v = bytes(info.read_memory(1, o))[0]
+            if sys.version_info[0] < 3:
+                v = struct.unpack ('<B', v)
+            str += "0x%02x" % v
+        text = info.string + "\t## bytes = %s" % str
+        info.set_result(info.length, text)
+
+
+class GlobalAddrDisassembler(TestDisassembler):
+    """Check the gdb.disassembler.format_address method."""
+
+    def disassemble(self, info):
+        gdb.disassembler.builtin_disassemble(info)
+        arch = info.architecture
+        addr = info.address
+        str = gdb.disassembler.format_address(arch, addr)
+        text = info.string + "\t## addr = %s" % str
+        info.set_result(info.length, text)
+
+
+class NonMemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw an error (not a memory error) before setting a result."""
+
+    def disassemble(self, info):
+        raise gdb.GdbError("error before setting a result")
+        gdb.disassembler.builtin_disassemble(info)
+
+
+class NonMemoryErrorLateDisassembler(TestDisassembler):
+    """Throw an error (not a memory error) after setting a result."""
+
+    def disassemble(self, info):
+        gdb.disassembler.builtin_disassemble(info)
+        raise gdb.GdbError("error after setting a result")
+
+
+class MemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a memory error before setting a result."""
+
+    def disassemble(self, info):
+        info.read_memory(1, -info.address + 2)
+        gdb.disassembler.builtin_disassemble(info)
+
+
+class MemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a memoryh error after setting a result."""
+
+    def disassemble(self, info):
+        gdb.disassembler.builtin_disassemble(info)
+        info.read_memory(1, -info.address + 2)
+
+
+class SimpleMemoryErrorDisassembler(TestDisassembler):
+    """Some basic testing around setting memory errors, ensure that the
+    length and string return to None after setting a memory error."""
+
+    def disassemble(self, info):
+        if info.length is not None:
+            raise gdb.GdbError("length is not None before")
+        if info.string is not None:
+            raise gdb.GdbError("string is not None before")
+        info.set_result(1, "!! INVALID !! ")
+        info.memory_error(0)
+        if info.length is not None:
+            raise gdb.GdbError("length is not None after")
+        if info.string is not None:
+            raise gdb.GdbError("string is not None after")
+
+
+class CaughtMemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a memory error before setting a result."""
+
+    def disassemble(self, info):
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError as e:
+            info.memory_error(-info.address + 2)
+            return None
+        gdb.disassembler.builtin_disassemble(info)
+
+
+class CaughtMemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a memoryh error after setting a result."""
+
+    def disassemble(self, info):
+        gdb.disassembler.builtin_disassemble(info)
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError as e:
+            # This memory error will discard the earlier result and
+            # mark this disassembly as failed with a memory error.
+            info.memory_error(-info.address + 2)
+
+
+class SetResultBeforeBuiltinDisassembler(TestDisassembler):
+    """Set a result, then call the builtin disassembler."""
+
+    def disassemble(self, info):
+        info.set_result(1, "!! DISCARD THIS TEXT !! ")
+        gdb.disassembler.builtin_disassemble(info)
+
+
+class CaughtMemoryErrorEarlyAndReplaceDisassembler(TestDisassembler):
+    """Throw a memory error before setting a result."""
+
+    def disassemble(self, info):
+        tag = "NO MEMORY ERROR"
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError as e:
+            info.memory_error(-info.address + 2)
+            tag = "GOT MEMORY ERROR"
+
+        # This disassembly will replace the earlier memory error
+        # marker, and leave this instruction disassembling just fine,
+        # however, the tag that we add will tell us that we did see a
+        # memory error.
+        gdb.disassembler.builtin_disassemble(info)
+        text = info.string + "\t## tag = %s" % tag
+        info.set_result(info.length, text)
+
+
+class TaggingDisassembler(TestDisassembler):
+    """A simple disassembler that just tags the output."""
+
+    def __init__(self, tag):
+        super(TaggingDisassembler, self).__init__()
+        self._tag = tag
+
+    def disassemble(self, info):
+        gdb.disassembler.builtin_disassemble(info)
+        text = info.string + "\t## tag = %s" % self._tag
+        info.set_result(info.length, text)
+
+
+class GlobalColorDisassembler(TestDisassembler):
+    """A disassembler performs syntax highlighting."""
+
+    def disassemble(self, info):
+        gdb.disassembler.builtin_disassemble(info)
+        gdb.disassembler.syntax_highlight(info)
+
+
+class GlobalCachingDisassembler(TestDisassembler):
+    """A disassembler that caches the DisassembleInfo that is passed in. Once
+    the call into the disassembler is complete then the DisassembleInfo
+    becomes invalid, and any calls into it should trigger an
+    exception."""
+
+    # This is where we cache the DisassembleInfo object.
+    cached_insn_disas = None
+
+    def disassemble(self, info):
+        """Disassemble the instruction, add a CACHED comment to the output,
+        and cache the DisassembleInfo so that it is not garbage collected."""
+        GlobalCachingDisassembler.cached_insn_disas = info
+        gdb.disassembler.builtin_disassemble(info)
+        text = info.string + "\t## CACHED"
+        info.set_result(info.length, text)
+
+    @staticmethod
+    def check():
+        """Check that all of the methods on the cached DisassembleInfo trigger an
+        exception."""
+        info = GlobalCachingDisassembler.cached_insn_disas
+        assert isinstance(info, gdb.disassembler.DisassembleInfo)
+        try:
+            val = info.address
+            raise gdb.GdbError("DisassembleInfo.address is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError("DisassembleInfo.address raised an unexpected exception")
+
+        try:
+            val = info.string
+            raise gdb.GdbError("DisassembleInfo.string is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError("DisassembleInfo.string raised an unexpected exception")
+
+        try:
+            val = info.length
+            raise gdb.GdbError("DisassembleInfo.length is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError("DisassembleInfo.length raised an unexpected exception")
+
+        try:
+            val = info.architecture
+            raise gdb.GdbError("DisassembleInfo.architecture is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError(
+                "DisassembleInfo.architecture raised an unexpected exception"
+            )
+
+        try:
+            val = info.read_memory(1, 0)
+            raise gdb.GdbError("DisassembleInfo.read is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError("DisassembleInfo.read raised an unexpected exception")
+
+        try:
+            val = info.set_result(1, "XXX")
+            raise gdb.GdbError("DisassembleInfo.set_result is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError(
+                "DisassembleInfo.set_result raised an unexpected exception"
+            )
+
+        print("PASS")
+
+
+class GlobalNullDisassembler(TestDisassembler):
+    """A disassembler that does not change the output at all."""
+
+    def disassemble(self, info):
+        pass
+
+
+class AnalyzingDisassembler(Disassembler):
+    def __init__(self, name):
+        """Constructor."""
+        super(AnalyzingDisassembler, self).__init__(name)
+
+        # Details about the instructions found during the first disassembler
+        # pass.
+        self._pass_1_length = []
+        self._pass_1_insn = []
+        self._pass_1_address = []
+
+        # The start and end address for the instruction we will replace with
+        # one or more 'nop' instructions during pass two.
+        self._start = None
+        self._end = None
+
+        # The index in the _pass_1_* lists for where the nop instruction can
+        # be found, also, the buffer of bytes that make up a nop instruction.
+        self._nop_index = None
+        self._nop_bytes = None
+
+        # The DisassembleInfo object passed into __call__ as INFO.
+        self._info = None
+
+        # A flag that indicates if we are in the first or second pass of
+        # this disassembler test.
+        self._first_pass = True
+
+        # The disassembled instructions collected during the second pass.
+        self._pass_2_insn = []
+
+        # A copy of _pass_1_insn that has been modified to include the extra
+        # 'nop' instructions we plan to insert during the second pass.  This
+        # is then checked against _pass_2_insn after the second disassembler
+        # pass has completed.
+        self._check = []
+
+    def __call__(self, info):
+        """Called to perform the disassembly."""
+
+        # Record INFO, we'll need to refer to this in READ_MEMORY which is
+        # called back to by the builtin disassembler.
+        self._info = info
+        gdb.disassembler.builtin_disassemble(info, self)
+
+        # Record some informaiton about the first 'nop' instruction we find.
+        if self._nop_index is None and info.string == "nop":
+            self._nop_index = len(self._pass_1_length)
+            # The offset in the following read_memory call defaults to 0.
+            self._nop_bytes = info.read_memory(info.length)
+
+        # Record information about each instruction that is disassembled.
+        # This test is performed in two passes, and we need different
+        # information in each pass.
+        if self._first_pass:
+            self._pass_1_length.append(info.length)
+            self._pass_1_insn.append(info.string)
+            self._pass_1_address.append(info.address)
+        else:
+            self._pass_2_insn.append(info.string)
+
+    def _read_replacement(self, length, offset):
+        """Return a slice of the buffer representing the replacement nop
+        instructions."""
+
+        assert(self._nop_bytes is not None)
+        rb = self._nop_bytes
+
+        # If this request is outside of a nop instruction then we don't know
+        # what to do, so just raise a memory error.
+        if offset >= len(rb) or (offset + length) > len(rb):
+            raise gdb.MemoryError("invalid length and offset combination")
+
+        # Return only the slice of the nop instruction as requested.
+        s = offset
+        e = offset + length
+        return rb[s:e]
+
+    def read_memory(self, len, offset):
+        """Callback used from the builtin disassembler to read the contents of
+        memory."""
+
+        info = self._info
+        assert info is not None
+
+        # If this request is within the region we are replacing with 'nop'
+        # instructions, then call the helper function to perform that
+        # replacement.
+        if self._start is not None:
+            assert self._end is not None
+            if info.address >= self._start and info.address < self._end:
+                return self._read_replacement(len, offset)
+
+        # Otherwise, we just forward this request to the default read memory
+        # implementation.
+        return info.read_memory(len, offset)
+
+    def find_replacement_candidate(self):
+        """Call this after the first disassembly pass.  This identifies a suitable
+        instruction to replace with 'nop' instruction(s)."""
+
+        if self._nop_index is None:
+            raise gdb.GdbError("no nop was found")
+
+        nop_idx = self._nop_index
+        nop_length = self._pass_1_length[nop_idx]
+
+        # First we look for an instruction that is larger than a nop
+        # instruction, but whose length is an exact multiple of the nop
+        # instruction's length.
+        replace_idx = None
+        for idx in range(len(self._pass_1_length)):
+            if (
+                idx > 0
+                and idx != nop_idx
+                and self._pass_1_insn[idx] != "nop"
+                and self._pass_1_length[idx] > self._pass_1_length[nop_idx]
+                and self._pass_1_length[idx] % self._pass_1_length[nop_idx] == 0
+            ):
+                replace_idx = idx
+                break
+
+        # If we still don't have a replacement candidate, then search again,
+        # this time looking for an instruciton that is the same length as a
+        # nop instruction.
+        if replace_idx is None:
+            for idx in range(len(self._pass_1_length)):
+                if (
+                    idx > 0
+                    and idx != nop_idx
+                    and self._pass_1_insn[idx] != "nop"
+                    and self._pass_1_length[idx] == self._pass_1_length[nop_idx]
+                ):
+                    replace_idx = idx
+                    break
+
+        # Weird, the nop instruction must be larger than every other
+        # instruction, or all instructions are 'nop'?
+        if replace_idx is None:
+            raise gdb.GdbError("can't find an instruction to replace")
+
+        # Record the instruction range that will be replaced with 'nop'
+        # instructions, and mark that we are now on the second pass.
+        self._start = self._pass_1_address[replace_idx]
+        self._end = self._pass_1_address[replace_idx] + self._pass_1_length[replace_idx]
+        self._first_pass = False
+        print("Replace from 0x%x to 0x%x with NOP" % (self._start, self._end))
+
+        # Finally, build the expected result.  Create the _check list, which
+        # is a copy of _pass_1_insn, but replace the instruction we
+        # identified above with a series of 'nop' instructions.
+        self._check = list (self._pass_1_insn)
+        nop_count = int(self._pass_1_length[replace_idx] / self._pass_1_length[nop_idx])
+        nops = ["nop"] * nop_count
+        self._check[replace_idx : (replace_idx + 1)] = nops
+
+    def check(self):
+        """Call this after the second disassembler pass to validate the output."""
+        if self._check != self._pass_2_insn:
+            raise gdb.GdbError("mismatch")
+        print("PASS")
+
+# Create a global instance of the AnalyzingDisassembler.  This isn't
+# registered as a disassembler yet though, that is done from the
+# py-diasm.exp later.
+analyzing_disassembler = AnalyzingDisassembler("AnalyzingDisassembler")
+
+def add_global_disassembler(dis_class):
+    """Create an instance of DIS_CLASS and register it as a global disassembler."""
+    dis = dis_class()
+    gdb.disassembler.register_disassembler(dis, None)
+
+
+# Start with all disassemblers removed.
+remove_all_python_disassemblers()
+
+print("Python script imported")
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/5] gdb/python: new gdb.architecture_names function
  2021-10-13 21:59 ` [PATCH 2/5] gdb/python: new gdb.architecture_names function Andrew Burgess
@ 2021-10-14  6:52   ` Eli Zaretskii
  2021-10-22 12:51     ` Andrew Burgess
  2021-10-20 20:40   ` Tom Tromey
  2021-10-22 13:02   ` Simon Marchi
  2 siblings, 1 reply; 80+ messages in thread
From: Eli Zaretskii @ 2021-10-14  6:52 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

> From: Andrew Burgess <andrew.burgess@embecosm.com>
> Date: Wed, 13 Oct 2021 22:59:07 +0100
> 
> Add a new function to the Python API, gdb.architecture_names().  This
> function returns a list containing all of the supported architecture
> names within the current build of GDB.
> 
> The values returned in this list are all of the possible values that
> can be returned from gdb.Architecture.name().
> ---
>  gdb/NEWS                             |  4 +++
>  gdb/doc/python.texi                  |  9 +++++
>  gdb/python/py-arch.c                 | 23 +++++++++++++
>  gdb/python/python-internal.h         |  1 +
>  gdb/python/python.c                  |  4 +++
>  gdb/testsuite/gdb.python/py-arch.exp | 51 ++++++++++++++++++++++++++++
>  6 files changed, 92 insertions(+)

The documentation parts are OK, with a single comment:

> +string.  The names returned in this list are the same names as are
> +returned from @code{gdb.Architecture.name ()}
> +(@pxref{gdbpy_architecture_name,,Architecture.name ()}).

Please remove "()" from the references to functions and methods.  This
usage of "()" to indicate a function is misleading, because it looks
like a call to a function with no arguments, and that's not what you
mean here.  Texinfo's markup @code already indicates this is a symbol
and not just a word, so using the parentheses is unnecessary (and
IMNSHO ugly).

Thanks.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 5/5] gdb/python: implement the print_insn extension language hook
  2021-10-13 21:59 ` [PATCH 5/5] gdb/python: implement the print_insn extension language hook Andrew Burgess
@ 2021-10-14  7:12   ` Eli Zaretskii
  2021-10-22 17:47     ` Andrew Burgess
  2021-10-22 13:30   ` Simon Marchi
  1 sibling, 1 reply; 80+ messages in thread
From: Eli Zaretskii @ 2021-10-14  7:12 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

> From: Andrew Burgess <andrew.burgess@embecosm.com>
> Date: Wed, 13 Oct 2021 22:59:10 +0100
> 
> diff --git a/gdb/NEWS b/gdb/NEWS
> index d001a03145d..fd1952a2f59 100644
> --- a/gdb/NEWS
> +++ b/gdb/NEWS
> @@ -32,6 +32,12 @@ maint show internal-warning backtrace
>    internal-error, or an internal-warning.  This is on by default for
>    internal-error and off by default for internal-warning.
>  
> +set style disassembly on|off
> +show style disassembly
> +  If GDB is compiled with Python support, and the Python pygments
> +  module is available, then, when this setting is on, disassembler
> +  output will have styling applied.

If this requires Python with a module that is not available by
default, I think a general style name like "disassembly" would be
misleading.  I suggest "pygment-disassembly" instead.

> +@item set style disassembly @samp{on|off}
> +Enable or disable disassembly styling.  This affects whether
> +disassembly output, such as the output of the @code{disassemble}
> +command, is styled.  The default is @samp{on}.  Note that disassembly
> +styling only works if styling in general is enabled, and if a source
> +highlighting library is available to @value{GDBN}.
> +
> +To highlight disassembly output @value{GDBN} must be compiled with
> +Python support, and the Python Pygments package must be available,

So what does the default ON setting mean if pygments module is not
available, or if GDB was not compiled with Python support?
> +@node Disassembly In Python
> +@cindex Python Instruction Disassembly

Index entries should begin with a lower-case letter, so that sorting
of the entries in the produced manual would not depend on the locale.

> +@defivar DisassembleInfo can_emit_style_escapes
> +This is @code{True} if the output stream that the disassembler is
> +currently printing too can support escape sequences use for colors,
                      ^^^
Should be "to".

> +otherwise this attribute is @code{False}.

Not sure why you are talking about escape sequences: we support
styling with colors also on terminals without escape sequences.  Does
this mean this feature _must_ have actual escape sequence support?

> +@defmethod DisassembleInfo memory_error (offset)
> +This method marks the @code{DisassembleInfo} as having experienced a
> +@code{gdb.MemoryError} when trying to access memory of @var{offset}
> +bytes from @code{DisassembleInfo.address}.

Should this text have a cross-reference to where MemoryError is
described?

> +The optional @var{architecture} is either a string, or the value
> +@code{None}.  If it is a string, then it should be the name of an
> +architecture known to @value{GDBN}, as returned either from
> +@code{gdb.Architecture.name()}
> +(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from
> +@code{gdb.architecture_names()}
> +(@pxref{gdb_architecture_names,,gdb.architecture_names}).

Please remove the parentheses from the references to these methods.

> +@defun format_address (architecture, address)
> +Returns @var{address} formatted as a string, in a style suitable for
> +including in the disassembly output of an instruction, for example a
> +formatted address might look like:
> +
> +@smallexample
> +0x00001042 <symbol+16>
> +@end smallexample
> +
> +@var{architecture} is a @code{gdb.Architecture} (@pxref{Architectures
> +In Python}), which is required to format the addresses correctly.
> +This can be obtained from @code{DisassembleInfo.architecture}.

This last paragraph should have @noindent before it, since it's a
continuation the description of format_address.

> +After calling this function the result in @var{info} @emph{might} have
> +been updated to include syntax highlighting escape sequences.  If
> +syntax highlighting is disabled in @value{GDBN}, or the output stream
> +doesn't support syntax highlighting, then this function will leave
> +@var{info} unchanged.

I suggest a cross-reference to commands that enable syntax
highlighting where you mention it.

> +This function should return a Python object that supports the buffer
> +protocol, i.e. a string, an array, or the object returned from

Please add @: after i.e., to prevent TeX from typesetting that as an
end of a sentence.

Thanks.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 1/5] gdb: make disassembler fprintf callback a static member function
  2021-10-13 21:59 ` [PATCH 1/5] gdb: make disassembler fprintf callback a static member function Andrew Burgess
@ 2021-10-20 20:40   ` Tom Tromey
  2021-10-22 12:51     ` Andrew Burgess
  0 siblings, 1 reply; 80+ messages in thread
From: Tom Tromey @ 2021-10-20 20:40 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

>>>>> "Andrew" == Andrew Burgess <andrew.burgess@embecosm.com> writes:

Andrew> The disassemble_info structure has four callbacks, we have three of
Andrew> them as static member functions within gdb_disassembler, the forth is
Andrew> just a global static function.

Andrew> However, this forth callback, is still only used from the

typo, should say "fourth".

Otherwise this looks good, thank you.

Tom

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/5] gdb/python: new gdb.architecture_names function
  2021-10-13 21:59 ` [PATCH 2/5] gdb/python: new gdb.architecture_names function Andrew Burgess
  2021-10-14  6:52   ` Eli Zaretskii
@ 2021-10-20 20:40   ` Tom Tromey
  2021-10-22 13:02   ` Simon Marchi
  2 siblings, 0 replies; 80+ messages in thread
From: Tom Tromey @ 2021-10-20 20:40 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

>>>>> "Andrew" == Andrew Burgess <andrew.burgess@embecosm.com> writes:

Andrew> Add a new function to the Python API, gdb.architecture_names().  This
Andrew> function returns a list containing all of the supported architecture
Andrew> names within the current build of GDB.

Andrew> The values returned in this list are all of the possible values that
Andrew> can be returned from gdb.Architecture.name().

This looks good to me.

Tom

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 3/5] gdb/python: move gdb.Membuf support into a new file
  2021-10-13 21:59 ` [PATCH 3/5] gdb/python: move gdb.Membuf support into a new file Andrew Burgess
@ 2021-10-20 20:42   ` Tom Tromey
  2021-10-22 12:52     ` Andrew Burgess
  0 siblings, 1 reply; 80+ messages in thread
From: Tom Tromey @ 2021-10-20 20:42 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

>>>>> "Andrew" == Andrew Burgess <andrew.burgess@embecosm.com> writes:

Andrew> So, then I moved all of the Membuf related code out into a new file,
Andrew> gdb/python/py-membuf.c, the interface is gdbpy_buffer_to_membuf, which
Andrew> wraps an array of bytes into a gdb.Membuf object.

I didn't read the patch in too much detail, but I think the general idea
is great.

Andrew> +int gdbpy_initialize_membuf (void)
Andrew> +  CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;

This can use just () instead of (void).

Tom

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 4/5] gdb: add extension language print_insn hook
  2021-10-13 21:59 ` [PATCH 4/5] gdb: add extension language print_insn hook Andrew Burgess
@ 2021-10-20 21:06   ` Tom Tromey
  0 siblings, 0 replies; 80+ messages in thread
From: Tom Tromey @ 2021-10-20 21:06 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

>>>>> "Andrew" == Andrew Burgess <andrew.burgess@embecosm.com> writes:

Andrew> What this commit does is put the extension language framework in place
Andrew> for a print_insn hook.  There's a new callback added to 'struct
Andrew> extension_language_ops', which is then filled in with NULL for Python
Andrew> and Guile.

This looks good to me, thanks.

Tom

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 1/5] gdb: make disassembler fprintf callback a static member function
  2021-10-20 20:40   ` Tom Tromey
@ 2021-10-22 12:51     ` Andrew Burgess
  0 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2021-10-22 12:51 UTC (permalink / raw)
  To: gdb-patches

* Tom Tromey <tom@tromey.com> [2021-10-20 14:40:28 -0600]:

> >>>>> "Andrew" == Andrew Burgess <andrew.burgess@embecosm.com> writes:
> 
> Andrew> The disassemble_info structure has four callbacks, we have three of
> Andrew> them as static member functions within gdb_disassembler, the forth is
> Andrew> just a global static function.
> 
> Andrew> However, this forth callback, is still only used from the
> 
> typo, should say "fourth".
> 
> Otherwise this looks good, thank you.

Thanks, I pushed this patch with the fix you suggested.

Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/5] gdb/python: new gdb.architecture_names function
  2021-10-14  6:52   ` Eli Zaretskii
@ 2021-10-22 12:51     ` Andrew Burgess
  0 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2021-10-22 12:51 UTC (permalink / raw)
  To: gdb-patches

* Eli Zaretskii <eliz@gnu.org> [2021-10-14 09:52:18 +0300]:

> > From: Andrew Burgess <andrew.burgess@embecosm.com>
> > Date: Wed, 13 Oct 2021 22:59:07 +0100
> > 
> > Add a new function to the Python API, gdb.architecture_names().  This
> > function returns a list containing all of the supported architecture
> > names within the current build of GDB.
> > 
> > The values returned in this list are all of the possible values that
> > can be returned from gdb.Architecture.name().
> > ---
> >  gdb/NEWS                             |  4 +++
> >  gdb/doc/python.texi                  |  9 +++++
> >  gdb/python/py-arch.c                 | 23 +++++++++++++
> >  gdb/python/python-internal.h         |  1 +
> >  gdb/python/python.c                  |  4 +++
> >  gdb/testsuite/gdb.python/py-arch.exp | 51 ++++++++++++++++++++++++++++
> >  6 files changed, 92 insertions(+)
> 
> The documentation parts are OK, with a single comment:
> 
> > +string.  The names returned in this list are the same names as are
> > +returned from @code{gdb.Architecture.name ()}
> > +(@pxref{gdbpy_architecture_name,,Architecture.name ()}).
> 
> Please remove "()" from the references to functions and methods.  This
> usage of "()" to indicate a function is misleading, because it looks
> like a call to a function with no arguments, and that's not what you
> mean here.  Texinfo's markup @code already indicates this is a symbol
> and not just a word, so using the parentheses is unnecessary (and
> IMNSHO ugly).

Thanks, I pushed this patch with the fix you suggested.

Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 3/5] gdb/python: move gdb.Membuf support into a new file
  2021-10-20 20:42   ` Tom Tromey
@ 2021-10-22 12:52     ` Andrew Burgess
  0 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2021-10-22 12:52 UTC (permalink / raw)
  To: gdb-patches

* Tom Tromey <tom@tromey.com> [2021-10-20 14:42:06 -0600]:

> >>>>> "Andrew" == Andrew Burgess <andrew.burgess@embecosm.com> writes:
> 
> Andrew> So, then I moved all of the Membuf related code out into a new file,
> Andrew> gdb/python/py-membuf.c, the interface is gdbpy_buffer_to_membuf, which
> Andrew> wraps an array of bytes into a gdb.Membuf object.
> 
> I didn't read the patch in too much detail, but I think the general idea
> is great.
> 
> Andrew> +int gdbpy_initialize_membuf (void)
> Andrew> +  CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
> 
> This can use just () instead of (void).

Thanks, I made this change.

As I didn't really change any of the code, just moved it to a new
file, I took your liking the idea as good enough, and pushed this
patch.

Thanks,
Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/5] gdb/python: new gdb.architecture_names function
  2021-10-13 21:59 ` [PATCH 2/5] gdb/python: new gdb.architecture_names function Andrew Burgess
  2021-10-14  6:52   ` Eli Zaretskii
  2021-10-20 20:40   ` Tom Tromey
@ 2021-10-22 13:02   ` Simon Marchi
  2021-10-22 17:34     ` Andrew Burgess
  2 siblings, 1 reply; 80+ messages in thread
From: Simon Marchi @ 2021-10-22 13:02 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches

Hi Andrew,

Sorry to reply only after you have merged this.  I just have a question
about the API.

On 2021-10-13 17:59, Andrew Burgess wrote:
> Add a new function to the Python API, gdb.architecture_names().  This
> function returns a list containing all of the supported architecture
> names within the current build of GDB.
> 
> The values returned in this list are all of the possible values that
> can be returned from gdb.Architecture.name().

Did you consider having a `gdb.architectures()` function, that returns a
list of gdb.Architecture objects?  And then, if you want the names, you
use gdb.Architecture.name:

    for arch in gdb.architectures():
	print(arch.name)

Being able to get the gdb.Architecture objects instead of just the name
sounds more flexible / future-proof to me.  Like, we have
`gdb.breakpoints()`, not `gdb.breakpoint_nums()`.  But there is perhaps
a technical reason why this doesn't work or isn't a good idea.

Simon

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 5/5] gdb/python: implement the print_insn extension language hook
  2021-10-13 21:59 ` [PATCH 5/5] gdb/python: implement the print_insn extension language hook Andrew Burgess
  2021-10-14  7:12   ` Eli Zaretskii
@ 2021-10-22 13:30   ` Simon Marchi
  1 sibling, 0 replies; 80+ messages in thread
From: Simon Marchi @ 2021-10-22 13:30 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches

Hi Andrew,

I don't have time to read all the code, so I'll just nit-pick on the
public API.

On 2021-10-13 17:59, Andrew Burgess wrote:
> This commit extends the Python API to include disassembler support,
> and additionally provides a syntax highlighting disassembler.
> 
> The motivation for this commit was to provide an API by which the user
> could write Python scripts that would augment the output of the
> disassembler.
> 
> To achieve this I have followed the model of the existing libopcodes
> disassembler, that is, instructions are disassembled one by one.  This
> does restrict the type of things that it is possible to do from a
> Python script, i.e. all additional output has to fit on a single line,
> but this was all I needed, and creating something more complex would,
> I think, require greater changes to how GDB's internal disassembler
> operates.
> 
> It was only once I had a working prototype that I realised I could
> very easily use this to perform syntax highlighting on GDB's
> disassembly output, so I've included that too.  The new commands added
> are:
> 
>   set style disassembly on|off
>   show style disassembly
> 
> which enable or disable disassembly syntax highlighting.
> 
> The disassembler API is contained in the new gdb.disassembler module,
> which defines the following classes:
> 
>   DisassembleInfo
> 
>       Similar to libopcodes disassemble_info structure, has read-only
>   attributes: address, string, length, architecture, and
>   can_emit_style_escape.  And has methods: read_memory, set_result,
>   and memory_error.
> 
>       Each time GDB wants an instruction disassembled, an instance of
>   this class is passed to a user written disassembler, by reading the
>   attributes, and calling the methods, the user can perform
>   disassembly, and set the result within the DisassembleInfo instance.
> 
>   Disassembler
> 
>       This is a base-class which user written disassemblers should
>   inherit from, just provides base implementations of __init__ and
>   __call__ which the user written disassembler should override.
> 
> The gdb.disassembler module also provides the following functions:
> 
>   register_disassembler
> 
>       This function registers an instance of a Disassembler sub-class
>   as a disassembler, either for one specific architecture, or, as a
>   global disassembler for all architectures.
> 
>   format_address
> 
>       This wraps GDB's print_address function, converting an address
>   into a string that can be placed into disassembler output.
> 
>   syntax_highlight
> 
>       This adds syntax highlighting escapes to some disassembler
>   output, users can call this from their own custom disassemblers to
>   retain syntax highlighting, this function handles switching syntax
>   highlighting off, or the case where the pygments library is not
>   available.
> 
>   builtin_disassemble
> 
>       This provides access to GDB's builtin disassembler.  A common
>   user case that I see is augmenting the existing disassembler
>   output.  The user code can call this function to have GDB
>   disassemble the instruction in the normal way, and then the user can
>   tweak the output before returning that as the result.  This function
>   also provides a mechanism to intercept the disassemblers reads of
>   memory, thus the user can adjust what GDB sees when it is
>   disassembling.
> 
> The included documentation provides a more detailed description of the
> API.
> ---
>  gdb/Makefile.in                        |   1 +
>  gdb/NEWS                               |  42 ++
>  gdb/data-directory/Makefile.in         |   1 +
>  gdb/disasm.c                           |   5 +-
>  gdb/disasm.h                           |  13 +-
>  gdb/doc/gdb.texinfo                    |  14 +
>  gdb/doc/python.texi                    | 252 +++++++
>  gdb/python/lib/gdb/disassembler.py     | 194 ++++++
>  gdb/python/py-arch.c                   |   9 +
>  gdb/python/py-disasm.c                 | 905 +++++++++++++++++++++++++
>  gdb/python/python-internal.h           |  21 +
>  gdb/python/python.c                    |  11 +-
>  gdb/testsuite/gdb.base/style.exp       |  45 +-
>  gdb/testsuite/gdb.python/py-disasm.c   |  25 +
>  gdb/testsuite/gdb.python/py-disasm.exp | 201 ++++++
>  gdb/testsuite/gdb.python/py-disasm.py  | 538 +++++++++++++++
>  16 files changed, 2267 insertions(+), 10 deletions(-)
>  create mode 100644 gdb/python/lib/gdb/disassembler.py
>  create mode 100644 gdb/python/py-disasm.c
>  create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
>  create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
>  create mode 100644 gdb/testsuite/gdb.python/py-disasm.py
> 
> diff --git a/gdb/Makefile.in b/gdb/Makefile.in
> index ec5d332c145..3981cc9507c 100644
> --- a/gdb/Makefile.in
> +++ b/gdb/Makefile.in
> @@ -392,6 +392,7 @@ SUBDIR_PYTHON_SRCS = \
>  	python/py-breakpoint.c \
>  	python/py-cmd.c \
>  	python/py-continueevent.c \
> +	python/py-disasm.c \
>  	python/py-event.c \
>  	python/py-evtregistry.c \
>  	python/py-evts.c \
> diff --git a/gdb/NEWS b/gdb/NEWS
> index d001a03145d..fd1952a2f59 100644
> --- a/gdb/NEWS
> +++ b/gdb/NEWS
> @@ -32,6 +32,12 @@ maint show internal-warning backtrace
>    internal-error, or an internal-warning.  This is on by default for
>    internal-error and off by default for internal-warning.
>  
> +set style disassembly on|off
> +show style disassembly
> +  If GDB is compiled with Python support, and the Python pygments
> +  module is available, then, when this setting is on, disassembler
> +  output will have styling applied.
> +
>  * Python API
>  
>    ** New function gdb.add_history(), which takes a gdb.Value object
> @@ -49,6 +55,42 @@ maint show internal-warning backtrace
>       containing all of the possible Architecture.name() values.  Each
>       entry is a string.
>  
> +  ** New Python API for wrapping GDB's disassembler:
> +
> +     - gdb.disassembler.register_disassembler(DISASSEMBLER, ARCH).
> +       DISASSEMBLER is a sub-class of gdb.disassembler.Disassembler.
> +       ARCH is either None or a string containing a bfd architecture
> +       name.  DISASSEMBLER is registered as a disassembler for
> +       architecture ARCH, or for all architectures if ARCH is None.
> +       The previous disassembler registered for ARCH is returned, this
> +       can be None if no previous disassembler was registered.
> +
> +     - gdb.disassembler.Disassembler is the class from which all
> +       disassemblers should inherit.  Its constructor takes a string,
> +       a name for the disassembler, which is currently only used is
> +       some debug output.  Sub-classes should override the __call__
> +       method to perform disassembly, invoking __call__ on this base
> +       class will raise an exception.
> +
> +     - gdb.disassembler.DisassembleInfo is the class used to describe
> +       a single disassembly request from GDB.  An instace of this

instace -> instance

> +       class is passed to the __call__ method of
> +       gdb.disassembler.Disassembler and has the following read-only
> +       attributes: 'address', 'string', 'length', 'architecture',
> +       'can_emit_style_escape', and the following methods
> +       'read_memory', 'set_result', and 'memory error'.

Just wondering, why having a "set_result" method instead of just having
the __call__ method return something?

You probably mean 'memory_error' instead of 'memory error'.  But can you
explain when you expect users to manually call "memory_error"?  I would
expect that calling read_memory may raise a gdb.MemoryError, but when
would the user manually generate a memory error?

And regardless of the above, I think it would be more Pythonic to have
the user raise an exception to signal an error, instead of calling a
method.  I'm not sure I understand the use case of calling set_result
and / or memory_error more than once, and have one overwrite the other.

> +
> +     - gdb.disassembler.format_address(ARCHITECTURE, ADDRESS), formats
> +       an address into a string so that the string can be included in
> +       the disassembler output.  ARCHITECTURE is a gdb.Architecture
> +       object.

Would it make sense to have that as a
"gdb.Architecture.format_address(ADDRESS)" method instead?  I'm thinking
that you might want to use this in other contexts than disassembly.  You
could always use gdb.disassembly.format_address anyway, but it would be
weird to use the gdb.disassembly module for something not
disassembly-related.

Simon

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/5] gdb/python: new gdb.architecture_names function
  2021-10-22 13:02   ` Simon Marchi
@ 2021-10-22 17:34     ` Andrew Burgess
  2021-10-22 18:42       ` Simon Marchi
  0 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2021-10-22 17:34 UTC (permalink / raw)
  To: Simon Marchi; +Cc: gdb-patches

* Simon Marchi <simon.marchi@polymtl.ca> [2021-10-22 09:02:04 -0400]:

> Hi Andrew,
> 
> Sorry to reply only after you have merged this.  I just have a question
> about the API.

Not a problem, better to get this sorted if something isn't right.

> 
> On 2021-10-13 17:59, Andrew Burgess wrote:
> > Add a new function to the Python API, gdb.architecture_names().  This
> > function returns a list containing all of the supported architecture
> > names within the current build of GDB.
> > 
> > The values returned in this list are all of the possible values that
> > can be returned from gdb.Architecture.name().
> 
> Did you consider having a `gdb.architectures()` function, that returns a
> list of gdb.Architecture objects?  And then, if you want the names, you
> use gdb.Architecture.name:
> 
>     for arch in gdb.architectures():
> 	print(arch.name)

Except I don't believe this would work.  A gdb.Architecture goes 1:1
with a gdbarch object, and we can have multiple gdbarch objects for
the same underlying bfd architecture.  For example, if two targets
have the same bfd-architecture, but different target descriptions,
you'll get different gdbarch objects, and different gdb.Architecture
objects.

So, in your above code you have to at least filter for duplicates.

Then, as I understand it, gdbarch objects are only created "on
demand", so in a multi-arch GDB, if I pass an x86 ELF, I don't
believe a risc-v gdb.Architecture is ever created.

So, in your above code, you'll only see the names of architectures
that the user has exposed to GDB.

That doesn't mean a gdb.architectures() method wouldn't be useful in
some other situation, I just don't think it did what I wanted - tell
me all the architectures this GDB knows about.

Maybe what I should do is add an architecture_created event, then I
could do:

  def do_something(arch):
    print(arch.name)

  for arch in gdb.architectures():
    do_something(arch)

  def handler(event):
    do_something(event.architecture)

  gdb.events.architecture_created.connect(handler)

> 
> Being able to get the gdb.Architecture objects instead of just the name
> sounds more flexible / future-proof to me.

I don't think I'm worried that adding this API will be something we
regret.  gdb.Architecture already has a .name method, so being able to
ask for the set of all possible names doesn't seem unreasonable.  For
that reason, I'd like to keep the existing method, even if you think
that the above would be a better API...

>                                             Like, we have
> `gdb.breakpoints()`, not `gdb.breakpoint_nums()`.  But there is perhaps
> a technical reason why this doesn't work or isn't a good idea.

The only different I see is that the names can exist before the
corresponding architectures, and we can have multiple architectures
per name, it would be like if gdb.breakpoint_nums() only returned the
numbers 1 -> 10 because we could only have that many breakpoints, but,
then we could actually have multiple breakpoints for each
number...... OK, this analogy got weird...

I've added the above additional APIs to my todo list, and I'll try to
get them implemented.

Thanks,
Andrew

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 5/5] gdb/python: implement the print_insn extension language hook
  2021-10-14  7:12   ` Eli Zaretskii
@ 2021-10-22 17:47     ` Andrew Burgess
  2021-10-22 18:33       ` Eli Zaretskii
  0 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2021-10-22 17:47 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb-patches

Thanks for the feedback.  I have a couple of questions:

* Eli Zaretskii <eliz@gnu.org> [2021-10-14 10:12:45 +0300]:

> > From: Andrew Burgess <andrew.burgess@embecosm.com>
> > Date: Wed, 13 Oct 2021 22:59:10 +0100
> > 
> > diff --git a/gdb/NEWS b/gdb/NEWS
> > index d001a03145d..fd1952a2f59 100644
> > --- a/gdb/NEWS
> > +++ b/gdb/NEWS
> > @@ -32,6 +32,12 @@ maint show internal-warning backtrace
> >    internal-error, or an internal-warning.  This is on by default for
> >    internal-error and off by default for internal-warning.
> >  
> > +set style disassembly on|off
> > +show style disassembly
> > +  If GDB is compiled with Python support, and the Python pygments
> > +  module is available, then, when this setting is on, disassembler
> > +  output will have styling applied.
> 
> If this requires Python with a module that is not available by
> default, I think a general style name like "disassembly" would be
> misleading.  I suggest "pygment-disassembly" instead.

I'm not sure I agree with this.  I don't see why we'd want to leak an
implementation detail (that we're using the pygment library) in a
setting name.

Surely, the setting should reflect what the effect is within GDB, and,
as far as possible, the implementation should be hidden from the
user. I'm hoping that some of the clarifications below might make this
more palatable...

> 
> > +@item set style disassembly @samp{on|off}
> > +Enable or disable disassembly styling.  This affects whether
> > +disassembly output, such as the output of the @code{disassemble}
> > +command, is styled.  The default is @samp{on}.  Note that disassembly
> > +styling only works if styling in general is enabled, and if a source
> > +highlighting library is available to @value{GDBN}.
> > +
> > +To highlight disassembly output @value{GDBN} must be compiled with
> > +Python support, and the Python Pygments package must be available,
> 
> So what does the default ON setting mean if pygments module is not
> available, or if GDB was not compiled with Python support?

You're correct, I've reworded this to reflect what actually happens.

First, as this is all implemented in Python, if GDB is compiled
without Python support then this setting (and the underlying feature)
is just not available.

If we do have Python, but not the Pygments library, then this feature
will be off by default, and an attempt to turn it on will give an
error that informs the user that the Python Pygments library is
missing.

Finally, if all the bits are in place, then this feature is on by
default.

> > +@node Disassembly In Python
> > +@cindex Python Instruction Disassembly
> 
> Index entries should begin with a lower-case letter, so that sorting
> of the entries in the produced manual would not depend on the locale.
> 
> > +@defivar DisassembleInfo can_emit_style_escapes
> > +This is @code{True} if the output stream that the disassembler is
> > +currently printing too can support escape sequences use for colors,
>                       ^^^
> Should be "to".
> 
> > +otherwise this attribute is @code{False}.
> 
> Not sure why you are talking about escape sequences: we support
> styling with colors also on terminals without escape sequences.  Does
> this mean this feature _must_ have actual escape sequence support?

I took the name from the internal GDB functions that do the same
check.  Can you point me at the terminal that does syntax highlighting
without using escape sequences, then I can see how this hooks back
into GDB.

Maybe I should rename this function 'supports_styling'? or something
similar?

Everything else I've fixed in my local tree.

Thanks,
Andrew


> 
> > +@defmethod DisassembleInfo memory_error (offset)
> > +This method marks the @code{DisassembleInfo} as having experienced a
> > +@code{gdb.MemoryError} when trying to access memory of @var{offset}
> > +bytes from @code{DisassembleInfo.address}.
> 
> Should this text have a cross-reference to where MemoryError is
> described?
> 
> > +The optional @var{architecture} is either a string, or the value
> > +@code{None}.  If it is a string, then it should be the name of an
> > +architecture known to @value{GDBN}, as returned either from
> > +@code{gdb.Architecture.name()}
> > +(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from
> > +@code{gdb.architecture_names()}
> > +(@pxref{gdb_architecture_names,,gdb.architecture_names}).
> 
> Please remove the parentheses from the references to these methods.
> 
> > +@defun format_address (architecture, address)
> > +Returns @var{address} formatted as a string, in a style suitable for
> > +including in the disassembly output of an instruction, for example a
> > +formatted address might look like:
> > +
> > +@smallexample
> > +0x00001042 <symbol+16>
> > +@end smallexample
> > +
> > +@var{architecture} is a @code{gdb.Architecture} (@pxref{Architectures
> > +In Python}), which is required to format the addresses correctly.
> > +This can be obtained from @code{DisassembleInfo.architecture}.
> 
> This last paragraph should have @noindent before it, since it's a
> continuation the description of format_address.
> 
> > +After calling this function the result in @var{info} @emph{might} have
> > +been updated to include syntax highlighting escape sequences.  If
> > +syntax highlighting is disabled in @value{GDBN}, or the output stream
> > +doesn't support syntax highlighting, then this function will leave
> > +@var{info} unchanged.
> 
> I suggest a cross-reference to commands that enable syntax
> highlighting where you mention it.
> 
> > +This function should return a Python object that supports the buffer
> > +protocol, i.e. a string, an array, or the object returned from
> 
> Please add @: after i.e., to prevent TeX from typesetting that as an
> end of a sentence.
> 
> Thanks.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 5/5] gdb/python: implement the print_insn extension language hook
  2021-10-22 17:47     ` Andrew Burgess
@ 2021-10-22 18:33       ` Eli Zaretskii
  0 siblings, 0 replies; 80+ messages in thread
From: Eli Zaretskii @ 2021-10-22 18:33 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

> Date: Fri, 22 Oct 2021 18:47:09 +0100
> From: Andrew Burgess <andrew.burgess@embecosm.com>
> Cc: gdb-patches@sourceware.org
> 
> > > +set style disassembly on|off
> > > +show style disassembly
> > > +  If GDB is compiled with Python support, and the Python pygments
> > > +  module is available, then, when this setting is on, disassembler
> > > +  output will have styling applied.
> > 
> > If this requires Python with a module that is not available by
> > default, I think a general style name like "disassembly" would be
> > misleading.  I suggest "pygment-disassembly" instead.
> 
> I'm not sure I agree with this.  I don't see why we'd want to leak an
> implementation detail (that we're using the pygment library) in a
> setting name.
> 
> Surely, the setting should reflect what the effect is within GDB, and,
> as far as possible, the implementation should be hidden from the
> user. I'm hoping that some of the clarifications below might make this
> more palatable...

It's just confusing to have a command that doesn't work without Python
to have a name that doesn't somehow hint on Python being required.
Especially since we have a lot of "set style" commands that don't
require Python.

> > Not sure why you are talking about escape sequences: we support
> > styling with colors also on terminals without escape sequences.  Does
> > this mean this feature _must_ have actual escape sequence support?
> 
> I took the name from the internal GDB functions that do the same
> check.  Can you point me at the terminal that does syntax highlighting
> without using escape sequences, then I can see how this hooks back
> into GDB.

The MS-Windows console is the one example I know about.

> Maybe I should rename this function 'supports_styling'? or something
> similar?

Yes, I think that'd be better.

Thanks.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCH 2/5] gdb/python: new gdb.architecture_names function
  2021-10-22 17:34     ` Andrew Burgess
@ 2021-10-22 18:42       ` Simon Marchi
  0 siblings, 0 replies; 80+ messages in thread
From: Simon Marchi @ 2021-10-22 18:42 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches



On 2021-10-22 13:34, Andrew Burgess wrote:
> * Simon Marchi <simon.marchi@polymtl.ca> [2021-10-22 09:02:04 -0400]:
> 
>> Hi Andrew,
>>
>> Sorry to reply only after you have merged this.  I just have a question
>> about the API.
> 
> Not a problem, better to get this sorted if something isn't right.
> 
>>
>> On 2021-10-13 17:59, Andrew Burgess wrote:
>>> Add a new function to the Python API, gdb.architecture_names().  This
>>> function returns a list containing all of the supported architecture
>>> names within the current build of GDB.
>>>
>>> The values returned in this list are all of the possible values that
>>> can be returned from gdb.Architecture.name().
>>
>> Did you consider having a `gdb.architectures()` function, that returns a
>> list of gdb.Architecture objects?  And then, if you want the names, you
>> use gdb.Architecture.name:
>>
>>     for arch in gdb.architectures():
>> 	print(arch.name)
> 
> Except I don't believe this would work.  A gdb.Architecture goes 1:1
> with a gdbarch object, and we can have multiple gdbarch objects for
> the same underlying bfd architecture.  For example, if two targets
> have the same bfd-architecture, but different target descriptions,
> you'll get different gdbarch objects, and different gdb.Architecture
> objects.
> 
> So, in your above code you have to at least filter for duplicates.
> 
> Then, as I understand it, gdbarch objects are only created "on
> demand", so in a multi-arch GDB, if I pass an x86 ELF, I don't
> believe a risc-v gdb.Architecture is ever created.
> 
> So, in your above code, you'll only see the names of architectures
> that the user has exposed to GDB.
> 
> That doesn't mean a gdb.architectures() method wouldn't be useful in
> some other situation, I just don't think it did what I wanted - tell
> me all the architectures this GDB knows about.

Ok, the above makes sense.

> Maybe what I should do is add an architecture_created event, then I
> could do:
> 
>   def do_something(arch):
>     print(arch.name)
> 
>   for arch in gdb.architectures():
>     do_something(arch)
> 
>   def handler(event):
>     do_something(event.architecture)
> 
>   gdb.events.architecture_created.connect(handler)

I don't think that's necessary.

>> Being able to get the gdb.Architecture objects instead of just the name
>> sounds more flexible / future-proof to me.
> 
> I don't think I'm worried that adding this API will be something we
> regret.  gdb.Architecture already has a .name method, so being able to
> ask for the set of all possible names doesn't seem unreasonable.  For
> that reason, I'd like to keep the existing method, even if you think
> that the above would be a better API...

I'm convinced that what you add is ok.  The list of architecture names
isn't the list of all possible gdbarches (and therefore
gdb.Architecture), they are different things.

>> `gdb.breakpoints()`, not `gdb.breakpoint_nums()`.  But there is perhaps
>> a technical reason why this doesn't work or isn't a good idea.
> 
> The only different I see is that the names can exist before the
> corresponding architectures, and we can have multiple architectures
> per name, it would be like if gdb.breakpoint_nums() only returned the
> numbers 1 -> 10 because we could only have that many breakpoints, but,
> then we could actually have multiple breakpoints for each
> number...... OK, this analogy got weird...
> 
> I've added the above additional APIs to my todo list, and I'll try to
> get them implemented.

I don't think it's necessary, I think what you have is ok.

Simon

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv2 0/3] Add Python API for the disassembler
  2021-10-13 21:59 [PATCH 0/5] Add Python API for the disassembler Andrew Burgess
                   ` (4 preceding siblings ...)
  2021-10-13 21:59 ` [PATCH 5/5] gdb/python: implement the print_insn extension language hook Andrew Burgess
@ 2022-03-23 22:41 ` Andrew Burgess
  2022-03-23 22:41   ` [PATCHv2 1/3] gdb: add new base class to gdb_disassembler Andrew Burgess
                     ` (3 more replies)
  5 siblings, 4 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-03-23 22:41 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

Finally gotten back to this work!

Changes in v2:

  - The first 3 patches from the v1 series were merged a while back,
    these were all refactoring, or auxiliary features,

  - There's a new #1 patch in the v2 series that does some new
    refactoring of GDB's disassembler classes, this was required in
    order to simplify the #3 patch,

  - Patch #2 in the v2 series is largely unchanged from patch #4 in
    the v1 series,

  - The syntax highlighting work that was in the v1 series was spun
    out into its own patch, and has been merged separately,

  - The format_address helper function that appeared in the v1 series,
    and that Simon suggested I make more general, was spun out into
    its own patch, and merged separately,

  - Finally, patch #3 in the v2 series is pretty much a complete
    rewrite from the v1 series in order to follow the approach
    suggested by Simon.  Results are now returned directly, either via
    'return' or by raising an exception, in contrast to the original
    approach which involved "setting" the result into an existing
    state object.

All feedback welcome,

Thanks,
Andrew

---

Andrew Burgess (3):
  gdb: add new base class to gdb_disassembler
  gdb: add extension language print_insn hook
  gdb/python: implement the print_insn extension language hook

 gdb/Makefile.in                        |   1 +
 gdb/NEWS                               |  34 +
 gdb/arm-tdep.c                         |   4 +-
 gdb/data-directory/Makefile.in         |   1 +
 gdb/disasm.c                           |  64 +-
 gdb/disasm.h                           |  89 ++-
 gdb/doc/python.texi                    | 239 ++++++
 gdb/extension-priv.h                   |  15 +
 gdb/extension.c                        |  20 +
 gdb/extension.h                        |  17 +
 gdb/guile/guile.c                      |   6 +-
 gdb/mips-tdep.c                        |   4 +-
 gdb/python/lib/gdb/disassembler.py     | 109 +++
 gdb/python/py-disasm.c                 | 970 +++++++++++++++++++++++++
 gdb/python/python-internal.h           |  16 +
 gdb/python/python.c                    |   3 +
 gdb/testsuite/gdb.python/py-disasm.c   |  25 +
 gdb/testsuite/gdb.python/py-disasm.exp | 150 ++++
 gdb/testsuite/gdb.python/py-disasm.py  | 456 ++++++++++++
 19 files changed, 2176 insertions(+), 47 deletions(-)
 create mode 100644 gdb/python/lib/gdb/disassembler.py
 create mode 100644 gdb/python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv2 1/3] gdb: add new base class to gdb_disassembler
  2022-03-23 22:41 ` [PATCHv2 0/3] Add Python API for the disassembler Andrew Burgess
@ 2022-03-23 22:41   ` Andrew Burgess
  2022-03-23 22:41   ` [PATCHv2 2/3] gdb: add extension language print_insn hook Andrew Burgess
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-03-23 22:41 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

The motivation for this change is an upcoming Python disassembler API
that I would like to add.  As part of that change I need to create a
new disassembler like class that contains a disassemble_info and a
gdbarch.  The management of these two objects is identical to how we
manage these objects within gdb_disassembler, so it might be tempting
for my new class to inherit from gdb_disassembler.

The problem however, is that gdb_disassembler has a tight connection
between its constructor, and its print_insn method.  In the
constructor the ui_file* that is passed in is replaced with a member
variable string_file*, and then in print_insn, the contents of the
member variable string_file are printed to the original ui_file*.

What this means is that the gdb_disassembler class has a tight
coupling between its constructor and print_insn; the class just isn't
intended to be used in a situation where print_insn is not going to be
called, which is how my (upcoming) sub-class would need to operate.

My solution then, is to separate out the management of the
disassemble_info and gdbarch into a new gdb_disassemble_info class,
and make this class a parent of gdb_disassembler.

In arm-tdep.c and mips-tdep.c, where we used to cast the
disassemble_info->application_data to a gdb_disassembler, we can now
cast to a gdb_disassemble_info as we only need to access the gdbarch
information.

Additionally, I have moved the gdb_disassembler::dis_asm_fprintf
method to gdb_disassemble_info::fprintf_func.  This method only makes
use of the disassemble_info::stream member variable, and will be
useful for my future Python disassembler sub-class.

Now, in my future patch, I can inherit from gdb_disassemble_info
instead of gdb_disassembler, I will then be able to obtain the
disassemble_info and gdbarch management, without having to work around
all the ui_file manipulation that gdb_disassembler performs.

There should be no user visible changes after this commit.
---
 gdb/arm-tdep.c  |  4 +--
 gdb/disasm.c    | 35 ++++++++++---------
 gdb/disasm.h    | 89 ++++++++++++++++++++++++++++++++++++-------------
 gdb/mips-tdep.c |  4 +--
 4 files changed, 89 insertions(+), 43 deletions(-)

diff --git a/gdb/arm-tdep.c b/gdb/arm-tdep.c
index d216d1daff7..89c0734ebc1 100644
--- a/gdb/arm-tdep.c
+++ b/gdb/arm-tdep.c
@@ -7759,8 +7759,8 @@ arm_displaced_step_fixup (struct gdbarch *gdbarch,
 static int
 gdb_print_insn_arm (bfd_vma memaddr, disassemble_info *info)
 {
-  gdb_disassembler *di
-    = static_cast<gdb_disassembler *>(info->application_data);
+  gdb_disassemble_info *di
+    = static_cast<gdb_disassemble_info *> (info->application_data);
   struct gdbarch *gdbarch = di->arch ();
 
   if (arm_pc_is_thumb (gdbarch, memaddr))
diff --git a/gdb/disasm.c b/gdb/disasm.c
index b4cde801cb0..128b097a51a 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -166,7 +166,7 @@ gdb_disassembler::dis_asm_print_address (bfd_vma addr,
 /* Format disassembler output to STREAM.  */
 
 int
-gdb_disassembler::dis_asm_fprintf (void *stream, const char *format, ...)
+gdb_disassemble_info::fprintf_func (void *stream, const char *format, ...)
 {
   va_list args;
 
@@ -781,24 +781,27 @@ get_all_disassembler_options (struct gdbarch *gdbarch)
 
 gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
 				    struct ui_file *file,
-				    di_read_memory_ftype read_memory_func)
-  : m_gdbarch (gdbarch),
+				    read_memory_ftype func)
+  : gdb_disassemble_info (gdbarch, &m_buffer, func,
+			  dis_asm_memory_error, dis_asm_print_address,
+			  fprintf_func),
     m_buffer (!use_ext_lang_colorization_p && disassembler_styling
 	      && file->can_emit_style_escape ()),
     m_dest (file)
+{ /* Nothing.  */ }
+
+/* See disasm.h.  */
+
+gdb_disassemble_info::gdb_disassemble_info
+  (struct gdbarch *gdbarch, struct ui_file *stream,
+   read_memory_ftype read_memory_func, memory_error_ftype memory_error_func,
+   print_address_ftype print_address_func, fprintf_ftype fprintf_func)
+    : m_gdbarch (gdbarch)
 {
-  init_disassemble_info (&m_di, &m_buffer, dis_asm_fprintf);
+  init_disassemble_info (&m_di, stream, fprintf_func);
   m_di.flavour = bfd_target_unknown_flavour;
-  m_di.memory_error_func = dis_asm_memory_error;
-  m_di.print_address_func = dis_asm_print_address;
-  /* NOTE: cagney/2003-04-28: The original code, from the old Insight
-     disassembler had a local optimization here.  By default it would
-     access the executable file, instead of the target memory (there
-     was a growing list of exceptions though).  Unfortunately, the
-     heuristic was flawed.  Commands like "disassemble &variable"
-     didn't work as they relied on the access going to the target.
-     Further, it has been superseeded by trust-read-only-sections
-     (although that should be superseeded by target_trust..._p()).  */
+  m_di.memory_error_func = memory_error_func;
+  m_di.print_address_func = print_address_func;
   m_di.read_memory_func = read_memory_func;
   m_di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
   m_di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
@@ -811,7 +814,9 @@ gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
   disassemble_init_for_target (&m_di);
 }
 
-gdb_disassembler::~gdb_disassembler ()
+/* See disasm.h.  */
+
+gdb_disassemble_info::~gdb_disassemble_info ()
 {
   disassemble_free_target (&m_di);
 }
diff --git a/gdb/disasm.h b/gdb/disasm.h
index 399afc5ae71..4499929fe14 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -38,43 +38,87 @@ struct gdbarch;
 struct ui_out;
 struct ui_file;
 
-class gdb_disassembler
-{
-  using di_read_memory_ftype = decltype (disassemble_info::read_memory_func);
-
-public:
-  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
-    : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
-  {}
-
-  ~gdb_disassembler ();
-
-  DISABLE_COPY_AND_ASSIGN (gdb_disassembler);
-
-  int print_insn (CORE_ADDR memaddr, int *branch_delay_insns = NULL);
+/* A wrapper around a disassemble_info and a gdbarch.  This is the core
+   set of data that all disassembler sub-classes will need.  This class
+   doesn't actually implement the disassembling process, that is something
+   that sub-classes will do, with each sub-class doing things slightly
+   differently.  */
 
-  /* Return the gdbarch of gdb_disassembler.  */
+struct gdb_disassemble_info
+{
+  /* Types for the function callbacks within disassemble_info.  */
+  using read_memory_ftype = decltype (disassemble_info::read_memory_func);
+  using memory_error_ftype = decltype (disassemble_info::memory_error_func);
+  using print_address_ftype = decltype (disassemble_info::print_address_func);
+  using fprintf_ftype = decltype (disassemble_info::fprintf_func);
+
+  /* Constructor, many fields in m_di are initialized from GDBARCH.  STREAM
+     is where the output of the disassembler will be written too, the
+     remaining arguments are function callbacks that are written into
+     m_di.  */
+  gdb_disassemble_info (struct gdbarch *gdbarch,
+			 struct ui_file *stream,
+			 read_memory_ftype read_memory_func,
+			 memory_error_ftype memory_error_func,
+			 print_address_ftype print_address_func,
+			 fprintf_ftype fprintf_func);
+
+  /* Destructor.  */
+  ~gdb_disassemble_info ();
+
+  /* Return the gdbarch we are disassembing for.  */
   struct gdbarch *arch ()
   { return m_gdbarch; }
 
-protected:
-  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file,
-		    di_read_memory_ftype func);
+  /* Return a pointer to the disassemble_info, this will be needed for
+     passing into the libopcodes disassembler.  */
+  struct disassemble_info *disasm_info ()
+  { return &m_di; }
 
+protected:
+  /* The stream that disassembler output is being written too.  */
   struct ui_file *stream ()
   { return (struct ui_file *) m_di.stream; }
 
-private:
-  struct gdbarch *m_gdbarch;
-
   /* Stores data required for disassembling instructions in
      opcodes.  */
   struct disassemble_info m_di;
 
+  /* Callback used as the disassemble_info's fprintf_func callback, this
+     writes to STREAM, which will be m_di.stream.  */
+  static int fprintf_func (void *stream, const char *format, ...)
+    ATTRIBUTE_PRINTF(2,3);
+
+private:
+  /* The architecture we are disassembling for.  */
+  struct gdbarch *m_gdbarch;
+
   /* If we own the string in `m_di.disassembler_options', we do so
      using this field.  */
   std::string m_disassembler_options_holder;
+};
+
+/* A dissassembler class that provides 'print_insn', a method for
+   disassembling a single instruction to the output stream.  */
 
+struct gdb_disassembler : public gdb_disassemble_info
+{
+  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
+    : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
+  { /* Nothing.  */ }
+
+  DISABLE_COPY_AND_ASSIGN (gdb_disassembler);
+
+  /* Disassemble a single instruction at MEMADDR to the ui_file* that was
+     passed to the constructor.  If a memory error occurs while
+     disassembling this instruction then an error will be thrown.  */
+  int print_insn (CORE_ADDR memaddr, int *branch_delay_insns = NULL);
+
+protected:
+  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file,
+		    read_memory_ftype func);
+
+private:
   /* This member variable is given a value by calling dis_asm_memory_error.
      If after calling into the libopcodes disassembler we get back a
      negative value (which indicates an error), then, if this variable has
@@ -107,9 +151,6 @@ class gdb_disassembler
      (currently just to addresses and symbols) as it goes.  */
   static bool use_ext_lang_colorization_p;
 
-  static int dis_asm_fprintf (void *stream, const char *format, ...)
-    ATTRIBUTE_PRINTF(2,3);
-
   static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
 				  unsigned int len,
 				  struct disassemble_info *info);
diff --git a/gdb/mips-tdep.c b/gdb/mips-tdep.c
index 5cd72ae2451..dd9b86ee8f5 100644
--- a/gdb/mips-tdep.c
+++ b/gdb/mips-tdep.c
@@ -7004,8 +7004,8 @@ reinit_frame_cache_sfunc (const char *args, int from_tty,
 static int
 gdb_print_insn_mips (bfd_vma memaddr, struct disassemble_info *info)
 {
-  gdb_disassembler *di
-    = static_cast<gdb_disassembler *>(info->application_data);
+  gdb_disassemble_info *di
+    = static_cast<gdb_disassemble_info *> (info->application_data);
   struct gdbarch *gdbarch = di->arch ();
 
   /* FIXME: cagney/2003-06-26: Is this even necessary?  The
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv2 2/3] gdb: add extension language print_insn hook
  2022-03-23 22:41 ` [PATCHv2 0/3] Add Python API for the disassembler Andrew Burgess
  2022-03-23 22:41   ` [PATCHv2 1/3] gdb: add new base class to gdb_disassembler Andrew Burgess
@ 2022-03-23 22:41   ` Andrew Burgess
  2022-03-23 22:41   ` [PATCHv2 3/3] gdb/python: implement the print_insn extension language hook Andrew Burgess
  2022-04-04 22:19   ` [PATCHv3 0/6] Add Python API for the disassembler Andrew Burgess
  3 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-03-23 22:41 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

This commit is setup for the next commit.

In the next commit I will add a Python API to intercept the print_insn
calls within GDB, each print_insn call is responsible for
disassembling, and printing one instruction.  After the next commit it
will be possible for a user to write Python code that either wraps
around the existing disassembler, or even, in extreme situations,
entirely replaces the existing disassembler.

This commit does not add any new Python API.

What this commit does is put the extension language framework in place
for a print_insn hook.  There's a new callback added to 'struct
extension_language_ops', which is then filled in with nullptr for Python
and Guile.

Finally, in the disassembler, the code is restructured so that the new
extension language function ext_lang_print_insn is called before we
delegate to gdbarch_print_insn.

After this, the next commit can focus entirely on providing a Python
implementation of the new print_insn callback.

There should be no user visible change after this commit.
---
 gdb/disasm.c         | 29 ++++++++++++++++++++++++++---
 gdb/extension-priv.h | 15 +++++++++++++++
 gdb/extension.c      | 20 ++++++++++++++++++++
 gdb/extension.h      | 17 +++++++++++++++++
 gdb/guile/guile.c    |  6 +++++-
 gdb/python/python.c  |  2 ++
 6 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/gdb/disasm.c b/gdb/disasm.c
index 128b097a51a..76f322ad4a9 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -821,6 +821,29 @@ gdb_disassemble_info::~gdb_disassemble_info ()
   disassemble_free_target (&m_di);
 }
 
+/* Wrapper around calling gdbarch_print_insn.  This function takes care of
+   first calling the extension language hooks for print_insn, and, if none
+   of the extension languages can print this instruction, calls
+   gdbarch_print_insn to do the work.
+
+   GDBARCH is the architecture to disassemble in, VMA is the address of the
+   instruction being disassembled, and INFO is the libopcodes disassembler
+   related information.  */
+
+static int
+gdb_print_insn_1 (struct gdbarch *gdbarch, CORE_ADDR vma,
+		  struct disassemble_info *info)
+{
+  /* Call into the extension languages to do the disassembly.  */
+  gdb::optional<int> length = ext_lang_print_insn (gdbarch, vma, info);
+  if (length.has_value ())
+    return *length;
+
+  /* No extension language wanted to do the disassembly, so do it
+     manually.  */
+  return gdbarch_print_insn (gdbarch, vma, info);
+}
+
 /* See disasm.h.  */
 
 bool gdb_disassembler::use_ext_lang_colorization_p = true;
@@ -834,7 +857,7 @@ gdb_disassembler::print_insn (CORE_ADDR memaddr,
   m_err_memaddr.reset ();
   m_buffer.clear ();
 
-  int length = gdbarch_print_insn (arch (), memaddr, &m_di);
+  int length = gdb_print_insn_1 (arch (), memaddr, &m_di);
 
   /* If we have successfully disassembled an instruction, styling is on, we
      think that the extension language might be able to perform styling for
@@ -869,7 +892,7 @@ gdb_disassembler::print_insn (CORE_ADDR memaddr,
 	  gdb_assert (!m_buffer.term_out ());
 	  m_buffer.~string_file ();
 	  new (&m_buffer) string_file (true);
-	  length = gdbarch_print_insn (arch (), memaddr, &m_di);
+	  length = gdb_print_insn_1 (arch (), memaddr, &m_di);
 	  gdb_assert (length > 0);
 	}
     }
@@ -1011,7 +1034,7 @@ gdb_buffered_insn_length (struct gdbarch *gdbarch,
   gdb_buffered_insn_length_init_dis (gdbarch, &di, insn, max_len, addr,
 				     &disassembler_options_holder);
 
-  int result = gdbarch_print_insn (gdbarch, addr, &di);
+  int result = gdb_print_insn_1 (gdbarch, addr, &di);
   disassemble_free_target (&di);
   return result;
 }
diff --git a/gdb/extension-priv.h b/gdb/extension-priv.h
index d9450b51231..7c74e721c57 100644
--- a/gdb/extension-priv.h
+++ b/gdb/extension-priv.h
@@ -263,6 +263,21 @@ struct extension_language_ops
      contents, or an empty optional.  */
   gdb::optional<std::string> (*colorize_disasm) (const std::string &content,
 						 gdbarch *gdbarch);
+
+  /* Print a single instruction from ADDRESS in architecture GDBARCH.  INFO
+     is the standard libopcodes disassembler_info structure.  Bytes for the
+     instruction being printed should be read using INFO->read_memory_func
+     as the actual instruction bytes might be in a buffer.
+
+     Use INFO->fprintf_func to print the results of the disassembly, and
+     return the length of the instruction.
+
+     If no instruction can be disassembled then return an empty value and
+     other extension languages will get a chance to perform the
+     disassembly.  */
+  gdb::optional<int> (*print_insn) (struct gdbarch *gdbarch,
+				    CORE_ADDR address,
+				    struct disassemble_info *info);
 };
 
 /* State necessary to restore a signal handler to its previous value.  */
diff --git a/gdb/extension.c b/gdb/extension.c
index 8f39b86e952..62f41c6445d 100644
--- a/gdb/extension.c
+++ b/gdb/extension.c
@@ -924,6 +924,26 @@ ext_lang_colorize_disasm (const std::string &content, gdbarch *gdbarch)
   return result;
 }
 
+/* See extension.h.  */
+
+gdb::optional<int>
+ext_lang_print_insn (struct gdbarch *gdbarch, CORE_ADDR address,
+		     struct disassemble_info *info)
+{
+  for (const struct extension_language_defn *extlang : extension_languages)
+    {
+      if (extlang->ops == nullptr
+	  || extlang->ops->print_insn == nullptr)
+	continue;
+      gdb::optional<int> length
+	(extlang->ops->print_insn (gdbarch, address, info));
+      if (length.has_value ())
+	return length;
+    }
+
+  return {};
+}
+
 /* Called via an observer before gdb prints its prompt.
    Iterate over the extension languages giving them a chance to
    change the prompt.  The first one to change the prompt wins,
diff --git a/gdb/extension.h b/gdb/extension.h
index 7eb89530c44..f7518f91b35 100644
--- a/gdb/extension.h
+++ b/gdb/extension.h
@@ -327,6 +327,23 @@ extern gdb::optional<std::string> ext_lang_colorize
 extern gdb::optional<std::string> ext_lang_colorize_disasm
   (const std::string &content, gdbarch *gdbarch);
 
+/* Try to disassemble a single instruction.  ADDRESS is the address that
+   the instructions apparent address, though bytes for the instruction
+   should be read by calling INFO->read_memory_func as we might be
+   disassembling out of a buffer.  GDBARCH is the architecture in which we
+   are performing the disassembly.
+
+   The disassembled instruction should be printed by calling
+   INFO->fprintf_func, and the length (in octets) of the disassembled
+   instruction should be returned.
+
+   If no instruction could be disassembled then an empty value is returned
+   and GDB will call gdbarch_print_insn to perform the disassembly
+   itself.  */
+
+extern gdb::optional<int> ext_lang_print_insn
+  (struct gdbarch *gdbarch, CORE_ADDR address, struct disassemble_info *info);
+
 #if GDB_SELF_TEST
 namespace selftests {
 extern void (*hook_set_active_ext_lang) ();
diff --git a/gdb/guile/guile.c b/gdb/guile/guile.c
index c040be556a6..50d43e2554b 100644
--- a/gdb/guile/guile.c
+++ b/gdb/guile/guile.c
@@ -130,8 +130,12 @@ static const struct extension_language_ops guile_extension_ops =
   gdbscm_breakpoint_has_cond,
   gdbscm_breakpoint_cond_says_stop,
 
-  NULL, /* gdbscm_check_quit_flag, */
   NULL, /* gdbscm_set_quit_flag, */
+  NULL, /* gdbscm_check_quit_flag, */
+  NULL, /* gdbscm_before_prompt, */
+  NULL, /* gdbscm_get_matching_xmethod_workers */
+  NULL, /* gdbscm_colorize */
+  NULL, /* gdbscm_print_insn */
 };
 #endif
 
diff --git a/gdb/python/python.c b/gdb/python/python.c
index f0d788bf8d5..df794dcd63a 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -166,6 +166,8 @@ static const struct extension_language_ops python_extension_ops =
   gdbpy_colorize,
 
   gdbpy_colorize_disasm,
+
+  NULL, /* gdbpy_print_insn, */
 };
 
 #endif /* HAVE_PYTHON */
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv2 3/3] gdb/python: implement the print_insn extension language hook
  2022-03-23 22:41 ` [PATCHv2 0/3] Add Python API for the disassembler Andrew Burgess
  2022-03-23 22:41   ` [PATCHv2 1/3] gdb: add new base class to gdb_disassembler Andrew Burgess
  2022-03-23 22:41   ` [PATCHv2 2/3] gdb: add extension language print_insn hook Andrew Burgess
@ 2022-03-23 22:41   ` Andrew Burgess
  2022-03-24  7:10     ` Eli Zaretskii
  2022-04-04 22:19   ` [PATCHv3 0/6] Add Python API for the disassembler Andrew Burgess
  3 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2022-03-23 22:41 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

This commit extends the Python API to include disassembler support.

The motivation for this commit was to provide an API by which the user
could write Python scripts that would augment the output of the
disassembler.

To achieve this I have followed the model of the existing libopcodes
disassembler, that is, instructions are disassembled one by one.  This
does restrict the type of things that it is possible to do from a
Python script, i.e. all additional output has to fit on a single line,
but this was all I needed, and creating something more complex would,
I think, require greater changes to how GDB's internal disassembler
operates.

The disassembler API is contained in the new gdb.disassembler module,
which defines the following classes:

  DisassembleInfo

      Similar to libopcodes disassemble_info structure, has read-only
  properties: address, architecture, and progspace.  And has methods:
  read_memory, and is_valid.

      Each time GDB wants an instruction disassembled, an instance of
  this class is passed to a user written disassembler function, by
  reading the properties, and calling the methods (and other support
  methods in the gdb.disassembler module) the user can perform and
  return the disassembly.

  Disassembler

      This is a base-class which user written disassemblers should
  inherit from.  This base class just provides base implementations of
  __init__ and __call__ which the user written disassembler should
  override.

  DisassemblerResult

      This class can be used to hold the result of a call to the
  disassembler, it's really just a wrapper around a string (the text
  of the disassembled instruction) and a length (in bytes).  The user
  can return an instance of this class from Disassembler.__call__ to
  represent the newly disassembled instruction.

The gdb.disassembler module also provides the following functions:

  register_disassembler

      This function registers an instance of a Disassembler sub-class
  as a disassembler, either for one specific architecture, or, as a
  global disassembler for all architectures.

  builtin_disassemble

      This provides access to GDB's builtin disassembler.  A common
  use case that I see is augmenting the existing disassembler output.
  The user code can call this function to have GDB disassemble the
  instruction in the normal way.  The user gets back a
  DisassemblerResult object, which they can then read in order to
  augment the disassembler output in any way they wish.

      This function also provides a mechanism to intercept the
  disassemblers reads of memory, thus the user can adjust what GDB
  sees when it is disassembling.

The included documentation provides a more detailed description of the
API.
---
 gdb/Makefile.in                        |   1 +
 gdb/NEWS                               |  34 +
 gdb/data-directory/Makefile.in         |   1 +
 gdb/doc/python.texi                    | 239 ++++++
 gdb/python/lib/gdb/disassembler.py     | 109 +++
 gdb/python/py-disasm.c                 | 970 +++++++++++++++++++++++++
 gdb/python/python-internal.h           |  16 +
 gdb/python/python.c                    |   3 +-
 gdb/testsuite/gdb.python/py-disasm.c   |  25 +
 gdb/testsuite/gdb.python/py-disasm.exp | 150 ++++
 gdb/testsuite/gdb.python/py-disasm.py  | 456 ++++++++++++
 11 files changed, 2003 insertions(+), 1 deletion(-)
 create mode 100644 gdb/python/lib/gdb/disassembler.py
 create mode 100644 gdb/python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

diff --git a/gdb/Makefile.in b/gdb/Makefile.in
index aecab41eeb8..bbcf8c467dc 100644
--- a/gdb/Makefile.in
+++ b/gdb/Makefile.in
@@ -393,6 +393,7 @@ SUBDIR_PYTHON_SRCS = \
 	python/py-cmd.c \
 	python/py-connection.c \
 	python/py-continueevent.c \
+	python/py-disasm.c \
 	python/py-event.c \
 	python/py-evtregistry.c \
 	python/py-evts.c \
diff --git a/gdb/NEWS b/gdb/NEWS
index e10062752d0..723ca6d5fee 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -16,6 +16,40 @@
      This is the same format that GDB uses when printing address, symbol,
      and offset information from the disassembler.
 
+  ** New Python API for wrapping GDB's disassembler:
+
+     - gdb.disassembler.register_disassembler(DISASSEMBLER, ARCH).
+       DISASSEMBLER is a sub-class of gdb.disassembler.Disassembler.
+       ARCH is either None or a string containing a bfd architecture
+       name.  DISASSEMBLER is registered as a disassembler for
+       architecture ARCH, or for all architectures if ARCH is None.
+       The previous disassembler registered for ARCH is returned, this
+       can be None if no previous disassembler was registered.
+
+     - gdb.disassembler.Disassembler is the class from which all
+       disassemblers should inherit.  Its constructor takes a string,
+       a name for the disassembler, which is currently only used is
+       some debug output.  Sub-classes should override the __call__
+       method to perform disassembly, invoking __call__ on this base
+       class will raise an exception.
+
+     - gdb.disassembler.DisassembleInfo is the class used to describe
+       a single disassembly request from GDB.  An instance of this
+       class is passed to the __call__ method of
+       gdb.disassembler.Disassembler and has the following read-only
+       attributes: 'address', and 'architecture', as well as the
+       following method: 'read_memory'.
+
+     - gdb.disassembler.builtin_disassemble(INFO, MEMORY_SOURCE),
+       calls GDB's builtin disassembler on INFO, which is a
+       gdb.disassembler.DisassembleInfo object.  MEMORY_SOURCE is
+       optional, its default value is None.  If MEMORY_SOURCE is not
+       None then it must be an object that has a 'read_memory' method.
+
+     - gdb.disassembler.DisassemblerResult is a class that can be used
+       to wrap the result of a call to a Disassembler.  It has
+       read-only attributes 'length' and 'string'.
+
 *** Changes in GDB 12
 
 * DBX mode is deprecated, and will be removed in GDB 13
diff --git a/gdb/data-directory/Makefile.in b/gdb/data-directory/Makefile.in
index b606fc654b5..cf5226f3961 100644
--- a/gdb/data-directory/Makefile.in
+++ b/gdb/data-directory/Makefile.in
@@ -69,6 +69,7 @@ PYTHON_DIR = python
 PYTHON_INSTALL_DIR = $(DESTDIR)$(GDB_DATADIR)/$(PYTHON_DIR)
 PYTHON_FILE_LIST = \
 	gdb/__init__.py \
+	gdb/disassembler.py \
 	gdb/FrameDecorator.py \
 	gdb/FrameIterator.py \
 	gdb/frames.py \
diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
index 7c414b01d70..8eb112dd99a 100644
--- a/gdb/doc/python.texi
+++ b/gdb/doc/python.texi
@@ -222,6 +222,7 @@
 * Registers In Python::         Python representation of registers.
 * Connections In Python::	Python representation of connections.
 * TUI Windows In Python::       Implementing new TUI windows.
+* Disassembly In Python::       Instruction Disassembly In Python
 @end menu
 
 @node Basic Python
@@ -598,6 +599,7 @@
 related prompts are prohibited from being changed.
 @end defun
 
+@anchor{gdb_architecture_names}
 @defun gdb.architecture_names ()
 Return a list containing all of the architecture names that the
 current build of @value{GDBN} supports.  Each architecture name is a
@@ -3278,6 +3280,7 @@
 particular frame (@pxref{Frames In Python}).
 @end defun
 
+@anchor{gdbpy_inferior_read_memory}
 @findex Inferior.read_memory
 @defun Inferior.read_memory (address, length)
 Read @var{length} addressable memory units from the inferior, starting at
@@ -6558,6 +6561,242 @@
 values can be 1 (left), 2 (middle), or 3 (right).
 @end defun
 
+@node Disassembly In Python
+@cindex python instruction disassembly
+@subsubsection Instruction Disassembly In Python
+
+@value{GDBN}'s builtin disassembler can be extended, or even replaced,
+using the Python API.  The disassembler related features are contained
+within the @code{gdb.disassembler} module:
+
+@deftp {class} gdb.disassembler.DisassembleInfo
+Disassembly is driven by instances of this class.  Each time
+@value{GDBN} needs to disassemble an instruction, an instance of this
+class is created and passed to a registered disassembler.  The
+disassembler is then responsible for disassembling an instruction and
+returning a result.  This class has the following attributes:
+
+@defivar DisassembleInfo address
+An integer containing the address at which @value{GDBN} wishes to
+disassemble a single instruction.
+@end defivar
+
+@defivar DisassembleInfo architecture
+The @code{gdb.Architecture} (@pxref{Architectures In Python}) for
+which @value{GDBN} is currently disassembling.
+@end defivar
+
+@defivar DisassembleInfo progspace
+The @code{gdb.Progspace} (@pxref{Progspaces In Python,,Program Spaces
+In Python}) for which @value{GDBN} is currently disassembling.
+@end defivar
+
+@defmethod DisassembleInfo read_memory (length, offset)
+This method allows the disassembler to read the bytes of the
+instruction to be disassembled.  The method reads @var{length} bytes,
+starting at @var{offset} from
+@code{DisassembleInfo.address}.
+
+It is important that the disassembler read the instruction bytes using
+this method, rather than reading inferior memory directly, as in some
+cases @value{GDBN} disassembles from an internal buffer rather than
+directly from inferior memory.
+
+Returns a buffer object, which behaves much like an array or a string,
+just as @code{Inferior.read_memory} does
+(@pxref{gdbpy_inferior_read_memory,,Inferior.read_memory}).
+@end defmethod
+
+@defmethod DisassembleInfo is_valid ()
+Returns @code{True} if the @code{DisassembleInfo} object is valid,
+@code{False} if not.  A @code{DisassembleInfo} object will become
+invalid once the disassembly call for which the @code{DisassembleInfo}
+was created, has returned.  Calling other @code{DisassembleInfo}
+methods, or accessing @code{DisassembleInfo} properties, will raise a
+@code{RuntimeError} exception if it is invalid.
+@end defmethod
+@end deftp
+
+@deftp {class} Disassembler
+This is a base class from which all user implemented disassemblers
+must inherit.
+
+@defmethod Disassembler __init__ (name)
+The constructor takes @var{name}, a string, which should be a short
+name for this disassembler.  Currently, this name is only used in some
+debug output.
+@end defmethod
+
+@defmethod Disassembler __call__ (info)
+The @code{__call__} method must be overridden by sub-classes to
+perform disassembly.  Calling @code{__call__} on this base class will
+raise a @code{NotImplementedError} exception.
+
+The @var{info} argument is an instance of @code{DisassembleInfo}, and
+describes the instruction that @value{GDBN} wants disassembling.
+
+If this function returns @code{None} then this indicates to
+@value{GDBN} that this sub-class doesn't wish to disassemble the
+requested instruction, @value{GDBN} will then use its builtin
+disassembler to perform the disassembly.
+
+Or, this function can return an object that represents the
+disassembled instruction.  The object must have the following two
+attributes:
+
+@defvar length
+The length of the disassembled instruction in bytes, which must be
+greater than zero.
+@end defvar
+
+@defvar string
+A non-empty string representing the disassembled instruction.
+@end defvar
+
+The @code{DisassemblerResult} type is defined as a possible class to
+represent disassembled instructions, but it is not required to use
+this type, so long as the required attributes are present.
+
+The @code{__call__} method can raise a @code{gdb.MemoryError}
+exception (@pxref{Exception Handling}) to indicate to @value{GDBN}
+that there was a problem accessing the required memory, this will then
+be displayed by @value{GDBN} within the disassembler output.
+
+Any other exception type raised by the @code{__call__} method is an
+error, @value{GDBN} will display the error and then use its builtin
+disassembler to disassemble the instruction instead.
+@end defmethod
+@end deftp
+
+@deftp {class} DisassemblerResult
+This class is provided as a means to hold the result of calling
+@code{Disassembler.__call__}.  It is not required to use this type,
+any type with the required attributes will do.
+
+The required attributes, which this class provides are:
+
+@defvar length
+The length of the disassembled instruction in bytes, which must be
+greater than zero.
+@end defvar
+
+@defvar string
+A non-empty string representing the disassembled instruction.
+@end defvar
+
+This class also provides a constructor:
+
+@defun DisassemblerResult.__init__ (@var{length}, @var{string})
+Initialise an instance of this class, @var{length} is the length of
+the disassembled instruction in bytes, which must be greater than
+zero, and @var{string} is a non-empty string that represents the
+disassembled instruction.
+@end defun
+@end deftp
+
+@defun register_disassembler (disassembler, architecture)
+The @var{disassembler} must be a sub-class of @code{Disassembler}.
+
+The optional @var{architecture} is either a string, or the value
+@code{None}.  If it is a string, then it should be the name of an
+architecture known to @value{GDBN}, as returned either from
+@code{gdb.Architecture.name}
+(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from
+@code{gdb.architecture_names}
+(@pxref{gdb_architecture_names,,gdb.architecture_names}).
+
+The @var{disassembler} will be installed for the architecture named by
+@var{architecture}, or if @var{architecture} is @code{None}, then
+@var{disassembler} will be installed as a global disassembler for use
+by all architectures.
+
+@value{GDBN} only records a single disassembler for each architecture,
+and a single global disassembler.  Calling
+@code{register_disassembler} for an architecture, or for the global
+disassembler, will replace any existing disassembler registered for
+that @var{architecture} value.  The previous disassembler is returned.
+
+When @value{GDBN} is looking for a disassembler to use, @value{GDBN}
+first looks for an architecture specific disassembler.  If none has
+been registered then @value{GDBN} looks for a global disassembler (one
+registered with @var{architecture} set to @code{None}).  Only one
+disassembler is called to perform disassembly, so, if there is both an
+architecture specific disassembler, and a global disassembler
+registered, it is the architecture specific disassembler that will be
+used.
+
+@value{GDBN} tracks the architecture specific, and global
+disassemblers separately, so it doesn't matter in which order
+disassemblers are created or registed, an architecture specific
+disassembler, if present, will always be used before a global
+disassembler.
+@end defun
+
+@defun builtin_disassemble (info, memory_source)
+This function calls back into @value{GDBN}'s builtin disassembler to
+disassemble the instruction identified by @var{info}, an instance of
+@code{DisassembleInfo}.
+
+If the requested instruction disassembled successfully then an instance
+of @code{DisassemblerResult} is returned.
+
+If the builtin disassembler fails then this function will raise a
+@code{gdb.MemoryError} exception.
+
+The optional @var{memory_source} argument has the default value of
+@code{None}, in which case, the builtin disassembler will read the
+instruction from memory in the normal way.
+
+If @var{memory_source} is not @code{None}, then it should be an
+instance of a class that implements the following method:
+
+@defmethod memory_source read_memory (length, offset)
+This method will be called by the builtin disassembler to fetch bytes
+of the instruction being disassembled.  @var{length} is the number of
+bytes to fetch, and @var{offset} is the offset from the address of the
+instruction being disassembled, this address is obtained from
+@code{DisassembleInfo.address}.
+
+This function should return a Python object that supports the buffer
+protocol, i.e.@: a string, an array, or the object returned from
+@code{DisassembleInfo.read_memory}.
+
+The length of the returned buffer @emph{must} be @var{length}
+otherwise a @code{ValueError} exception will be raised.
+
+Alternatively, this function can raise a @code{gdb.MemoryError}
+exception to indicate that the read failed, raising any other
+exception type is an error.
+
+It is important to understand that, even when this function raises a
+@code{gdb.MemoryError}, it is the internal disassembler itself that
+reports the memory error to @value{GDBN}.  The reason for this is that
+the disassembler might probe memory to see if a byte is readable or
+not, if the byte can't be read then the disassembler may choose not to
+report an error, but to instead disassemble the bytes that it does
+have available.
+@end defmethod
+@end defun
+
+Here is an example that registers a global disassembler.  The new
+disassembler invokes the builtin disassembler, and then adds a
+comment, @code{## Comment}, to each line of disassembly output:
+
+@smallexample
+class ExampleDisassembler(gdb.disassembler.Disassembler):
+    def __init__(self):
+        super(ExampleDisassembler, self).__init__("ExampleDisassembler")
+
+    def __call__(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        if result.string is not None:
+            length = result.length
+            text = result.string + "\t## Comment"
+            return gdb.disassembler.DisassemblerResult(length, text)
+
+gdb.disassembler.register_disassembler(ExampleDisassembler())
+@end smallexample
+
 @node Python Auto-loading
 @subsection Python Auto-loading
 @cindex Python auto-loading
diff --git a/gdb/python/lib/gdb/disassembler.py b/gdb/python/lib/gdb/disassembler.py
new file mode 100644
index 00000000000..19ec0ecf82f
--- /dev/null
+++ b/gdb/python/lib/gdb/disassembler.py
@@ -0,0 +1,109 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+"""Disassembler related module."""
+
+import gdb
+import _gdb.disassembler
+
+from _gdb.disassembler import *
+
+# Module global dictionary of gdb.disassembler.Disassembler objects.
+# The keys of this dictionary are bfd architecture names, or the
+# special value None.
+#
+# When a request to disassemble comes in we first lookup the bfd
+# architecture name from the gdbarch, if that name exists in this
+# dictionary then we use that Disassembler object.
+#
+# If there's no architecture specific disassembler then we look for
+# the key None in this dictionary, and if that key exists, we use that
+# disassembler.
+#
+# If none of the above checks found a suitable disassembler, then no
+# disassembly is performed in Python.
+_disassemblers_dict = {}
+
+
+class Disassembler(object):
+    """A base class from which all user implemented disassemblers must
+    inherit."""
+
+    def __init__(self, name):
+        """Constructor.  Takes a name, which should be a string, which can be
+        used to identify this disassembler in diagnostic messages."""
+        self.name = name
+
+    def __call__(self, info):
+        """A default implementation of __call__.  All sub-classes must
+        override this method.  Calling this default implementation will throw
+        a NotImplementedError exception."""
+        raise NotImplementedError("Disassembler.__call__")
+
+
+def register_disassembler(disassembler, architecture=None):
+    """Register a disassembler.  DISASSEMBLER is a sub-class of
+    gdb.disassembler.Disassembler.  ARCHITECTURE is either None or a
+    string, the name of an architecture known to GDB.
+
+    DISASSEMBLER is registered as a disassmbler for ARCHITECTURE, or
+    all architectures when ARCHITECTURE is None.
+
+    Returns the previous disassembler registered with this
+    ARCHITECTURE value.
+    """
+
+    if not isinstance(disassembler, Disassembler) and disassembler is not None:
+        raise TypeError("disassembler should sub-class gdb.disassembler.Disassembler")
+
+    old = None
+    if architecture in _disassemblers_dict:
+        old = _disassemblers_dict[architecture]
+        del _disassemblers_dict[architecture]
+    if disassembler is not None:
+        _disassemblers_dict[architecture] = disassembler
+
+    # Call the private _set_enabled function within the
+    # _gdb.disassembler module.  This function sets a global flag
+    # within GDB's C++ code that enables or dissables the Python
+    # disassembler functionality, this improves performance of the
+    # disassembler by avoiding unneeded calls into Python when we know
+    # that no disassemblers are registered.
+    _gdb.disassembler._set_enabled(len(_disassemblers_dict) > 0)
+    return old
+
+
+def _print_insn(info):
+    """This function is called by GDB when it wants to disassemble an
+    instruction.  INFO describes the instruction to be
+    disassembled."""
+
+    def lookup_disassembler(arch):
+        try:
+            name = arch.name()
+            if name is None:
+                return None
+            if name in _disassemblers_dict:
+                return _disassemblers_dict[name]
+            if None in _disassemblers_dict:
+                return _disassemblers_dict[None]
+            return None
+        except:
+            return None
+
+    disassembler = lookup_disassembler(info.architecture)
+    if disassembler is None:
+        return None
+    return disassembler(info)
diff --git a/gdb/python/py-disasm.c b/gdb/python/py-disasm.c
new file mode 100644
index 00000000000..9aa1b156023
--- /dev/null
+++ b/gdb/python/py-disasm.c
@@ -0,0 +1,970 @@
+/* Python interface to instruction disassembly.
+
+   Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "defs.h"
+#include "python-internal.h"
+#include "dis-asm.h"
+#include "arch-utils.h"
+#include "charset.h"
+#include "disasm.h"
+#include "progspace.h"
+
+/* Implement gdb.disassembler.DisassembleInfo type.  An object of this type
+   represents a single disassembler request from GDB.  */
+
+struct disasm_info_object {
+  PyObject_HEAD
+
+  /* The architecture in which we are disassembling.  */
+  struct gdbarch *gdbarch;
+
+  /* The program_space in which we are disassembling.  */
+  struct program_space *program_space;
+
+  /* Address of the instruction to disassemble.  */
+  bfd_vma address;
+
+  /* The disassemble_info passed from core GDB, this contains the
+     callbacks necessary to read the instruction from core GDB, and to
+     print the disassembled instruction.  */
+  disassemble_info *gdb_info;
+};
+
+extern PyTypeObject disasm_info_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_info_object");
+
+/* Implement gdb.disassembler.DisassemblerResult type, an object that holds
+   the result of calling the disassembler.  This is mostly the length of
+   the disassembled instruction (in bytes), and the string representing the
+   disassembled instruction.  */
+
+struct disasm_result_object {
+  PyObject_HEAD
+
+  /* The length of the disassembled instruction in bytes.  */
+  int length;
+
+  /* A buffer which, when allocated, holds the disassembled content of an
+     instruction.  */
+  string_file *content;
+};
+
+extern PyTypeObject disasm_result_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_result_object");
+
+/* When this is false we fast path out of gdbpy_print_insn, which should
+   keep the performance impact of the Python disassembler down.  This is
+   set to true from Python by calling gdb.disassembler._set_enabled() when
+   the user registers a disassembler.  */
+
+static bool python_print_insn_enabled = false;
+
+/* A sub-class of gdb_disassembler that holds a pointer to a Python
+   DisassembleInfo object.  A pointer to an instance of this class is
+   placed in the application_data field of the disassemble_info that is
+   used when we call gdbarch_print_insn.  */
+
+struct gdbpy_disassembler : public gdb_disassemble_info
+{
+  /* Constructor.  */
+  gdbpy_disassembler (disasm_info_object *obj, PyObject *memory_source);
+
+  /* Get the DisassembleInfo object pointer.  */
+  disasm_info_object *
+  py_disasm_info () const
+  {
+    return m_disasm_info_object;
+  }
+
+  /* Callbacks used by disassemble_info.  */
+  static void memory_error_func (int status, bfd_vma memaddr,
+				 struct disassemble_info *info);
+  static void print_address_func (bfd_vma addr,
+				  struct disassemble_info *info);
+  static int read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+			       unsigned int len,
+			       struct disassemble_info *info);
+
+  /* Return a reference to an optional that contains the address at which a
+     memory error occurred.  The optional will only have a value if a
+     memory error actually occurred.  */
+  const gdb::optional<CORE_ADDR> &memory_error_address () const
+  { return m_memory_error_address; }
+
+  /* Return the content of the disassembler as a string.  The contents are
+     moved out of the disassembler, so after this call the disassembler
+     contents have been reset back to empty.  */
+  std::string release ()
+  {
+    return m_string_file.release ();
+  }
+
+private:
+
+  /* Where the disassembler result is written.  */
+  string_file m_string_file;
+
+  /* The DisassembleInfo object we are disassembling for.  */
+  disasm_info_object *m_disasm_info_object;
+
+  /* When the user indicates that a memory error has occurred then the
+     address of the memory error is stored in here.  */
+  gdb::optional<CORE_ADDR> m_memory_error_address;
+
+  /* When the user calls the builtin_disassemble function, if they pass a
+     memory source object then a pointer to the object is placed in here,
+     otherwise, this field is nullptr.  */
+  PyObject *m_memory_source;
+};
+
+/* Return true if OBJ is still valid, otherwise, return false.  A valid OBJ
+   will have a non-nullptr gdb_info field.  */
+
+static bool
+disasm_info_object_is_valid (disasm_info_object *obj)
+{
+  return obj->gdb_info != nullptr;
+}
+
+/* Implement DisassembleInfo.is_valid(), really just a wrapper around the
+   disasm_info_object_is_valid function above.  */
+
+static PyObject *
+disasmpy_info_is_valid (PyObject *self, PyObject *args)
+{
+  disasm_info_object *disasm_obj = (disasm_info_object *) self;
+
+  if (disasm_info_object_is_valid (disasm_obj))
+    Py_RETURN_TRUE;
+
+  Py_RETURN_FALSE;
+}
+
+/* Set the Python exception to be a gdb.MemoryError object, with ADDRESS
+   as its payload.  */
+
+static void
+disasmpy_set_memory_error_for_address (CORE_ADDR address)
+{
+  PyObject *address_obj = gdb_py_object_from_longest (address).release ();
+  PyErr_SetObject (gdbpy_gdb_memory_error, address_obj);
+}
+
+
+/* Ensure that a gdb.disassembler.DisassembleInfo is valid.  */
+#define DISASMPY_DISASM_INFO_REQUIRE_VALID(Info)			\
+  do {									\
+    if (!disasm_info_object_is_valid (Info))					\
+      {									\
+	PyErr_SetString (PyExc_RuntimeError,				\
+			 _("DisassembleInfo is no longer valid."));	\
+	return nullptr;							\
+      }									\
+  } while (0)
+
+/* Implement gdb.disassembler.builtin_disassemble().  Calls back into GDB's
+   builtin disassembler.  The first argument is a DisassembleInfo object
+   describing what to disassemble.  The second argument is optional and
+   provides a mechanism to modify the memory contents that the builtin
+   disassembler will actually disassemble.
+
+   Returns an instance of gdb.disassembler.DisassemblerResult, an object
+   that wraps a disassembled instruction, or it raises a
+   gdb.MemoryError.  */
+
+static PyObject *
+disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *info_obj, *memory_source_obj = nullptr;
+  static const char *keywords[] = { "info", "memory_source", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
+					&disasm_info_object_type, &info_obj,
+					&memory_source_obj))
+    return nullptr;
+
+  disasm_info_object *disasm_info = (disasm_info_object *) info_obj;
+  if (!disasm_info_object_is_valid (disasm_info))
+    {
+      PyErr_SetString (PyExc_RuntimeError,
+		       _("DisassembleInfo is no longer valid."));
+      return nullptr;
+    }
+
+  /* A memory source is any object that provides the 'read_memory'
+     callback.  At this point we only check for the existence of a
+     'read_memory' attribute, if this isn't callable then we'll throw an
+     exception from within gdbpy_disassembler::read_memory_func.  */
+  if (memory_source_obj != nullptr)
+    {
+      if (!PyObject_HasAttrString (memory_source_obj, "read_memory"))
+	{
+	  PyErr_SetString (PyExc_TypeError,
+			   _("memory_source doesn't have a read_memory method"));
+	  return nullptr;
+	}
+    }
+
+  /* Where the result will be written.  */
+  gdbpy_disassembler disassembler (disasm_info, memory_source_obj);
+
+  /* Now actually perform the disassembly.  */
+  int length
+    = gdbarch_print_insn (disasm_info->gdbarch, disasm_info->address,
+			  disassembler.disasm_info ());
+
+  if (length == -1)
+    {
+
+      /* In an ideal world, every disassembler should always call the
+	 memory error function before returning a status of -1 as the only
+	 error a disassembler should encounter is a failure to read
+	 memory.  Unfortunately, there are some disassemblers who don't
+	 follow this rule, and will return -1 without calling the memory
+	 error function.
+
+	 To make the Python API simpler, we just classify everything as a
+	 memory error, but the message has to be modified for the case
+	 where the disassembler didn't call the memory error function.  */
+      if (disassembler.memory_error_address ().has_value ())
+	{
+	  CORE_ADDR addr = *disassembler.memory_error_address ();
+	  disasmpy_set_memory_error_for_address (addr);
+	}
+      else
+	PyErr_Format (gdbpy_gdb_memory_error, "unknown disassembly error");
+      return nullptr;
+    }
+
+  /* Instructions are either non-zero in length, or we got an error,
+     indicated by a length of -1, which we handled above.  */
+  gdb_assert (length > 0);
+
+  /* We should not have seen a memory error in this case.  */
+  gdb_assert (!disassembler.memory_error_address ().has_value ());
+
+  /* Create an object to represent the result of the disassembler.  */
+  gdbpy_ref<disasm_result_object> res
+    (PyObject_New (disasm_result_object, &disasm_result_object_type));
+  res->length = length;
+  res->content = new string_file;
+  *(res->content) = disassembler.release ();
+
+  return reinterpret_cast<PyObject *> (res.release ());
+}
+
+/* Implement gdb.set_enabled function.  Takes a boolean parameter, and
+   sets whether GDB should enter the Python disassembler code or not.
+
+   This is called from within the Python code when a new disassembler is
+   registered.  When no disassemblers are registered the global C++ flag
+   is set to false, and GDB never even enters the Python environment to
+   check for a disassembler.
+
+   When the user registers a new Python disassembler, the global C++ flag
+   is set to true, and now GDB will enter the Python environment to check
+   if there's a disassembler registered for the current architecture.  */
+
+static PyObject *
+disasmpy_set_enabled (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *newstate;
+  static const char *keywords[] = { "state", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O", keywords,
+					&newstate))
+    return nullptr;
+
+  if (!PyBool_Check (newstate))
+    {
+      PyErr_SetString (PyExc_TypeError,
+		       _("The value passed to `_set_enabled' must be a boolean."));
+      return nullptr;
+    }
+
+  python_print_insn_enabled = PyObject_IsTrue (newstate);
+  Py_RETURN_NONE;
+}
+
+/* Implement DisassembleInfo.read_memory(LENGTH, OFFSET).  Read LENGTH
+   bytes at OFFSET from the start of the instruction currently being
+   disassembled, and return a memory buffer containing the bytes.
+
+   OFFSET defaults to zero if it is not provided.  LENGTH is required.  If
+   the read fails then this will raise a gdb.MemoryError exception.  */
+
+static PyObject *
+disasmpy_info_read_memory (PyObject *self, PyObject *args, PyObject *kw)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+
+  LONGEST length, offset = 0;
+  gdb::unique_xmalloc_ptr<gdb_byte> buffer;
+  static const char *keywords[] = { "length", "offset", nullptr };
+
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "L|L", keywords,
+					&length, &offset))
+    return nullptr;
+
+  /* The apparent address from which we are reading memory.  Note that in
+     some cases GDB actually disassembles instructions from a buffer, so
+     we might not actually be reading this information directly from the
+     inferior memory.  This is all hidden behind the read_memory_func API
+     within the disassemble_info structure.  */
+  CORE_ADDR address = obj->address + offset;
+
+  /* Setup a buffer to hold the result.  */
+  buffer.reset ((gdb_byte *) xmalloc (length));
+
+  /* Read content into BUFFER.  If the read fails then raise a memory
+     error, otherwise, convert BUFFER to a Python memory buffer, and return
+     it to the user.  */
+  disassemble_info *info = obj->gdb_info;
+  if (info->read_memory_func ((bfd_vma) address, buffer.get (),
+			      (unsigned int) length, info) != 0)
+    {
+      disasmpy_set_memory_error_for_address (address);
+      return nullptr;
+    }
+  return gdbpy_buffer_to_membuf (std::move (buffer), address, length);
+}
+
+/* Implement DisassembleInfo.address attribute, return the address at which
+   GDB would like an instruction disassembled.  */
+
+static PyObject *
+disasmpy_info_address (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdb_py_object_from_longest (obj->address).release ();
+}
+
+/* Implement DisassembleInfo.architecture attribute.  Return the
+   gdb.Architecture in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_architecture (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdbarch_to_arch_object (obj->gdbarch);
+}
+
+/* Implement DisassembleInfo.progspace attribute.  Return the
+   gdb.Progspace in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_progspace (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return pspace_to_pspace_object (obj->program_space).release ();
+}
+
+/* This implements the disassemble_info read_memory_func callback.  This
+   will either call the standard read memory function, or, if the user has
+   supplied a memory source (see disasmpy_builtin_disassemble) then this
+   will call back into Python to obtain the memory contents.
+
+   Read LEN bytes from MEMADDR and place them into BUFF.  Return 0 on
+   success (in which case BUFF has been filled), or -1 on error, in which
+   case the contents of BUFF are undefined.  */
+
+int
+gdbpy_disassembler::read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+				      unsigned int len,
+				      struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  disasm_info_object *obj = dis->py_disasm_info ();
+  PyObject *memory_source = dis->m_memory_source;
+
+  /* The simple case, the user didn't pass a separate memory source, so we
+     just delegate to the standard disassemble_info read_memory_func,
+     passing in the original disassemble_info object, which core GDB might
+     require in order to read the instruction bytes (when reading the
+     instruction from a buffer).  */
+  if (memory_source == nullptr)
+    return obj->gdb_info->read_memory_func (memaddr, buff, len, obj->gdb_info);
+
+  /* The user provided a separate memory source, we need to call the
+     read_memory method on the memory source and use the buffer it returns
+     as the bytes of memory.  */
+  LONGEST offset = (LONGEST) memaddr - (LONGEST) obj->address;
+  gdbpy_ref<> result_obj (PyObject_CallMethod (memory_source, "read_memory",
+					       "KL", len, offset));
+  if (result_obj == nullptr)
+    {
+      /* If we got a gdb.MemoryError then we ignore this and just report
+	 that the read failed to the caller.  The caller is then
+	 responsible for calling the memory_error_func if it wants to.
+	 Remember, the disassembler might just be probing to see if these
+	 bytes can be read, if we automatically call the memory error
+	 function, we can end up registering an error prematurely.  */
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	PyErr_Clear ();
+      else
+	gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Convert the result to a buffer.  */
+  Py_buffer py_buff;
+  if (!PyObject_CheckBuffer (result_obj.get ())
+      || PyObject_GetBuffer (result_obj.get(), &py_buff, PyBUF_CONTIG_RO) < 0)
+    {
+      PyErr_Format (PyExc_TypeError,
+		    _("Result from read_memory is not a buffer"));
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Wrap PY_BUFF so that it is cleaned up correctly at the end of this
+     scope.  */
+  Py_buffer_up buffer_up (&py_buff);
+
+  /* Validate that the buffer is the correct length.  */
+  if (py_buff.len != len)
+    {
+      PyErr_Format (PyExc_ValueError,
+		    _("Result from read_memory is incorrectly sized buffer"));
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Copy the data out of the Python buffer and return succsess.*/
+  const gdb_byte *buffer = (const gdb_byte *) py_buff.buf;
+  memcpy (buff, buffer, len);
+  return 0;
+}
+
+/* Implement DisassemblerResult.length attribute, return the length of the
+   disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_length (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  return gdb_py_object_from_longest (obj->length).release ();
+}
+
+/* Implement DisassemblerResult.string attribute, return the content string
+   of the disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_string (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+
+  gdb_assert (obj->content != nullptr);
+  gdb_assert (strlen (obj->content->c_str ()) > 0);
+  gdb_assert (obj->length > 0);
+  return PyUnicode_Decode (obj->content->c_str (),
+			   obj->content->size (),
+			   host_charset (), nullptr);
+}
+
+/* Implement DisassemblerResult.__init__.  Takes two arguments, an
+   integer, the length in bytes of the disassembled instruction, and a
+   string, the disassembled content of the instruction.  */
+
+static int
+disasmpy_result_init (PyObject *self, PyObject *args, PyObject *kwargs)
+{
+  static const char *keywords[] = { "length", "string", NULL };
+  int length;
+  const char *string;
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kwargs, "is", keywords,
+					&length, &string))
+    return -1;
+
+  if (length <= 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("Length must be greater than 0."));
+      return -1;
+    }
+
+  if (strlen (string) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("String must not be empty."));
+      return -1;
+    }
+
+  disasm_result_object *obj = (disasm_result_object *) self;
+  if (obj->content == nullptr)
+    obj->content = new string_file;
+  obj->length = length;
+  obj->content->write (string, strlen (string));
+  return 0;
+}
+
+/* Implement memory_error_func callback for disassemble_info.  Extract the
+   underlying DisassembleInfo Python object, and set a memory error on
+   it.  */
+
+void
+gdbpy_disassembler::memory_error_func (int status, bfd_vma memaddr,
+				       struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  dis->m_memory_error_address.emplace (memaddr);
+}
+
+/* Wrapper of print_address.  */
+
+void
+gdbpy_disassembler::print_address_func (bfd_vma addr,
+					struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  print_address (dis->arch (), addr, (struct ui_file *) info->stream);
+}
+
+/* constructor.  */
+
+gdbpy_disassembler::gdbpy_disassembler (disasm_info_object *obj,
+					PyObject *memory_source)
+  : gdb_disassemble_info (obj->gdbarch, &m_string_file, read_memory_func,
+			  memory_error_func, print_address_func,
+			  fprintf_func),
+    m_disasm_info_object (obj),
+    m_memory_source (memory_source)
+{ /* Nothing.  */ }
+
+/* A wrapper around a reference to a Python DisassembleInfo object, which
+   ensures that the object is marked as invalid when we leave the enclosing
+   scope.
+
+   Each DisassembleInfo is created in gdbpy_print_insn, and is done with by
+   the time that function returns.  However, there's nothing to stop a user
+   caching a reference to the DisassembleInfo, and thus keeping the object
+   around.
+
+   We therefore have the notion of a DisassembleInfo becoming invalid, this
+   happens when gdbpy_print_insn returns.  This class is responsible for
+   marking the DisassembleInfo as invalid in its destructor.  */
+
+struct scoped_disasm_info_object
+{
+  /* Constructor.  */
+  scoped_disasm_info_object (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+			     disassemble_info *info)
+    : m_disasm_info (allocate_disasm_info_object ())
+  {
+    m_disasm_info->address = memaddr;
+    m_disasm_info->gdb_info = info;
+    m_disasm_info->gdbarch = gdbarch;
+    m_disasm_info->program_space = current_program_space;
+  }
+
+  /* Upon destruction mark m_diasm_info as invalid.  */
+  ~scoped_disasm_info_object ()
+  {
+    m_disasm_info->gdb_info = nullptr;
+  }
+
+  /* Return a pointer to the underlying disasm_info_object instance.  */
+  disasm_info_object *
+  get () const
+  {
+    return m_disasm_info.get ();
+  }
+
+private:
+
+  /* Wrapper around the call to PyObject_New, this wrapper function can be
+     called from the constructor initialization list, while PyObject_New, a
+     macro, can't.  */
+  static disasm_info_object *
+  allocate_disasm_info_object ()
+  {
+    return (disasm_info_object *) PyObject_New (disasm_info_object,
+						&disasm_info_object_type);
+  }
+
+  /* A reference to a gdb.disassembler.DisassembleInfo object.  When this
+     containing instance goes out of scope this reference is released,
+     however, the user might be holding other references to the
+     DisassembleInfo object in Python code, so the underlying object might
+     not be deleted.  */
+  gdbpy_ref<disasm_info_object> m_disasm_info;
+};
+
+/* See python-internal.h.  */
+
+gdb::optional<int>
+gdbpy_print_insn (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+		  disassemble_info *info)
+{
+  /* Early exit case.  This must be done as early as possible, and
+     definitely before we enter Python environment.  The
+     python_print_insn_enabled flag is set (from Python) only when the user
+     has installed one (or more) Python disassemblers.  So in the common
+     case (no custom disassembler installed) this flag will be false,
+     allowing for a quick return.  */
+  if (!gdb_python_initialized || !python_print_insn_enabled)
+    return {};
+
+  gdbpy_enter enter_py (get_current_arch (), current_language);
+
+  /* The attribute we are going to lookup that provides the print_insn
+     functionality.  */
+  static const char *callback_name = "_print_insn";
+
+  /* Grab a reference to the gdb.disassembler module, and check it has the
+     attribute that we need.  */
+  gdbpy_ref<> gdb_python_disassembler_module
+    (PyImport_ImportModule ("gdb.disassembler"));
+  if (gdb_python_disassembler_module == nullptr
+      || !PyObject_HasAttrString (gdb_python_disassembler_module.get (),
+				  callback_name))
+    return {};
+
+  /* Now grab the callback attribute from the module.  */
+  gdbpy_ref<> hook
+    (PyObject_GetAttrString (gdb_python_disassembler_module.get (),
+			     callback_name));
+  if (hook == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Create the new DisassembleInfo object we will pass into Python.  This
+     object will be marked as invalid when we leave this scope.  */
+  scoped_disasm_info_object scoped_disasm_info (gdbarch, memaddr, info);
+  disasm_info_object *disasm_info = scoped_disasm_info.get ();
+
+  /* Call into the registered disassembler to (possibly) perform the
+     disassembly.  */
+  PyObject *insn_disas_obj = (PyObject *) disasm_info;
+  gdbpy_ref<> result (PyObject_CallFunctionObjArgs (hook.get (),
+						    insn_disas_obj,
+						    nullptr));
+
+  if (result == nullptr)
+    {
+      /* The call into Python code resulted in an exception.  If this was a
+	 gdb.MemoryError, then we can figure out an address and call the
+	 disassemble_info::memory_error_func to report the error back to
+	 core GDB.  Any other exception type we assume means a bug in the
+	 user's code, and print stack.  */
+
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	{
+	  /* A gdb.MemoryError might have an address attribute which
+	     contains the address at which the memory error occurred.  If
+	     this is the case then use this address, otherwise, fallback to
+	     just using the address of the instruction we were asked to
+	     disassemble.  */
+	  PyObject *error_type, *error_value, *error_traceback;
+	  CORE_ADDR addr;
+
+	  PyErr_Fetch (&error_type, &error_value, &error_traceback);
+
+	  if (error_value != nullptr
+	      && PyObject_HasAttrString (error_value, "address"))
+	    {
+	      PyObject *addr_obj = PyObject_GetAttrString (error_value,
+							   "address");
+	      if (get_addr_from_python (addr_obj, &addr) < 0)
+		addr = disasm_info->address;
+	    }
+	  else
+	    addr = disasm_info->address;
+
+	  PyErr_Clear ();
+	  info->memory_error_func (-1, addr, info);
+	  return gdb::optional<int> (-1);
+	}
+      else
+	{
+	  /* Anything that is not gdb.MemoryError.  */
+	  gdbpy_print_stack ();
+	  return {};
+	}
+    }
+  else if (result == Py_None)
+    {
+      /* A return value of None indicates that the Python code could not,
+	 or doesn't want to, disassemble this instruction.  Just return an
+	 empty result and core GDB will try to disassemble this for us.  */
+      return {};
+    }
+
+  /* The call into Python neither raised an exception, or returned None.
+     Check to see if the result looks valid.  */
+  gdbpy_ref<> length_obj (PyObject_GetAttrString (result.get (), "length"));
+  if (length_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  gdbpy_ref<> string_obj (PyObject_GetAttrString (result.get (), "string"));
+  if (string_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+  if (!gdbpy_is_string (string_obj.get ()))
+    {
+      PyErr_SetString (PyExc_TypeError, _("string attribute is not a string."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  gdb::unique_xmalloc_ptr<char> string
+    = gdbpy_obj_to_string (string_obj.get ());
+  if (string == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  long length;
+  if (!gdb_py_int_as_long (length_obj.get (), &length))
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  long max_insn_length = (gdbarch_max_insn_length_p (gdbarch) ?
+			  gdbarch_max_insn_length (gdbarch) : INT_MAX);
+  if (length <= 0 || length > max_insn_length)
+    {
+      PyErr_SetString (PyExc_ValueError, _("Invalid length attribute."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  if (strlen (string.get ()) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError, _("string attribute must not be empty."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Print the disassembled instruction back to core GDB, and return the
+     length of the disassembled instruction.  */
+  info->fprintf_func (info->stream, "%s", string.get ());
+  return gdb::optional<int> (length);
+}
+
+/* The tp_dealloc callback for the DisassemblerResult type.  Takes care of
+   deallocating the content buffer.  */
+
+static void
+disasmpy_dealloc_result (PyObject *self)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  delete obj->content;
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* The get/set attributes of the gdb.disassembler.DisassembleInfo type.  */
+
+static gdb_PyGetSetDef disasm_info_object_getset[] = {
+  { "address", disasmpy_info_address, nullptr,
+    "Start address of the instruction to disassemble.", nullptr },
+  { "architecture", disasmpy_info_architecture, nullptr,
+    "Architecture to disassemble in", nullptr },
+  { "progspace", disasmpy_info_progspace, nullptr,
+    "Program space to disassemble in", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* The methods of the gdb.disassembler.DisassembleInfo type.  */
+
+static PyMethodDef disasm_info_object_methods[] = {
+  { "read_memory", (PyCFunction) disasmpy_info_read_memory,
+    METH_VARARGS | METH_KEYWORDS,
+    "read_memory (LEN, OFFSET = 0) -> Octets[]\n\
+Read LEN octets for the instruction to disassemble." },
+  { "is_valid", disasmpy_info_is_valid, METH_NOARGS,
+    "is_valid () -> Boolean.\n\
+Return true if this DisassembleInfo is valid, false if not." },
+  {nullptr}  /* Sentinel */
+};
+
+/* The get/set attributes of the gdb.disassembler.DisassemblerResult type.  */
+
+static gdb_PyGetSetDef disasm_result_object_getset[] = {
+  { "length", disasmpy_result_length, nullptr,
+    "Length of the disassembled instruction.", nullptr },
+  { "string", disasmpy_result_string, nullptr,
+    "String representing the disassembled instruction.", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* These are the methods we add into the _gdb.disassembler module, which
+   are then imported into the gdb.disassembler module.  These are global
+   functions that support performing disassembly.  */
+
+PyMethodDef python_disassembler_methods[] =
+{
+  { "builtin_disassemble", (PyCFunction) disasmpy_builtin_disassemble,
+    METH_VARARGS | METH_KEYWORDS,
+    "builtin_disassemble (INFO, MEMORY_SOURCE = None) -> None\n\
+Disassemble using GDB's builtin disassembler.  INFO is an instance of\n\
+gdb.disassembler.DisassembleInfo.  The MEMORY_SOURCE, if not None, should\n\
+be an object with the read_memory method." },
+  { "_set_enabled", (PyCFunction) disasmpy_set_enabled,
+    METH_VARARGS | METH_KEYWORDS,
+    "_set_enabled (STATE) -> None\n\
+Set whether GDB should call into the Python _print_insn code or not." },
+  {nullptr, nullptr, 0, nullptr}
+};
+
+/* Structure to define the _gdb.disassembler module.  */
+
+static struct PyModuleDef python_disassembler_module_def =
+{
+  PyModuleDef_HEAD_INIT,
+  "_gdb.disassembler",
+  nullptr,
+  -1,
+  python_disassembler_methods,
+  nullptr,
+  nullptr,
+  nullptr,
+  nullptr
+};
+
+/* Called to initialize the Python structures in this file.  */
+
+int
+gdbpy_initialize_disasm
+(void)
+{
+  /* Create the _gdb.disassembler module, and add it to the _gdb module.  */
+
+  PyObject *gdb_disassembler_module;
+  gdb_disassembler_module = PyModule_Create (&python_disassembler_module_def);
+  if (gdb_disassembler_module == nullptr)
+    return -1;
+  PyModule_AddObject(gdb_module, "disassembler", gdb_disassembler_module);
+
+  /* This is needed so that 'import _gdb.disassembler' will work.  */
+  PyObject *dict = PyImport_GetModuleDict ();
+  PyDict_SetItemString (dict, "_gdb.disassembler", gdb_disassembler_module);
+
+  /* Having the tp_new field as nullptr means that this class can't be
+     created from user code.  The only way they can be created is from
+     within GDB, and then they are passed into user code.  */
+  gdb_assert (disasm_info_object_type.tp_new == nullptr);
+  if (PyType_Ready (&disasm_info_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassembleInfo",
+			      (PyObject *) &disasm_info_object_type) < 0)
+    return -1;
+
+  disasm_result_object_type.tp_new = PyType_GenericNew;
+  if (PyType_Ready (&disasm_result_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassemblerResult",
+			      (PyObject *) &disasm_result_object_type) < 0)
+    return -1;
+
+  return 0;
+}
+
+/* Describe the gdb.disassembler.DisassembleInfo type.  */
+
+PyTypeObject disasm_info_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassembleInfo",		/*tp_name*/
+  sizeof (disasm_info_object),			/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  0,						/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT,				/*tp_flags*/
+  "GDB instruction disassembler object",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  disasm_info_object_methods,			/* tp_methods */
+  0,						/* tp_members */
+  disasm_info_object_getset			/* tp_getset */
+};
+
+/* Describe the gdb.disassembler.DisassemblerResult type.  */
+
+PyTypeObject disasm_result_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassemblerResult",	/*tp_name*/
+  sizeof (disasm_result_object),		/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  disasmpy_dealloc_result,			/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,	/*tp_flags*/
+  "GDB object, representing a disassembler result",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  0,						/* tp_methods */
+  0,						/* tp_members */
+  disasm_result_object_getset,			/* tp_getset */
+  0,						/* tp_base */
+  0,						/* tp_dict */
+  0,						/* tp_descr_get */
+  0,						/* tp_descr_set */
+  0,						/* tp_dictoffset */
+  disasmpy_result_init,				/* tp_init */
+  0,						/* tp_alloc */
+};
diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
index d947b96033b..ed5894c1c3d 100644
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@@ -540,6 +540,8 @@ int gdbpy_initialize_connection ()
 int gdbpy_initialize_micommands (void)
   CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 void gdbpy_finalize_micommands ();
+int gdbpy_initialize_disasm ()
+  CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 
 /* A wrapper for PyErr_Fetch that handles reference counting for the
    caller.  */
@@ -822,4 +824,18 @@ extern bool gdbpy_is_architecture (PyObject *obj);
 
 extern bool gdbpy_is_progspace (PyObject *obj);
 
+/* Implement the 'print_insn' hook for Python.  Disassemble an instruction
+   whose address is ADDRESS for architecture GDBARCH.  The bytes of the
+   instruction should be read with INFO->read_memory_func as the
+   instruction being disassembled might actually be in a buffer.
+
+   Used INFO->fprintf_func to print the results of the disassembly, and
+   return the length of the instruction in octets.
+
+   If no instruction can be disassembled then return an empty value.  */
+
+extern gdb::optional<int> gdbpy_print_insn (struct gdbarch *gdbarch,
+					    CORE_ADDR address,
+					    disassemble_info *info);
+
 #endif /* PYTHON_PYTHON_INTERNAL_H */
diff --git a/gdb/python/python.c b/gdb/python/python.c
index df794dcd63a..84e4d64473b 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -167,7 +167,7 @@ static const struct extension_language_ops python_extension_ops =
 
   gdbpy_colorize_disasm,
 
-  NULL, /* gdbpy_print_insn, */
+  gdbpy_print_insn,
 };
 
 #endif /* HAVE_PYTHON */
@@ -2045,6 +2045,7 @@ do_start_initialization ()
 
   if (gdbpy_initialize_auto_load () < 0
       || gdbpy_initialize_values () < 0
+      || gdbpy_initialize_disasm () < 0
       || gdbpy_initialize_frames () < 0
       || gdbpy_initialize_commands () < 0
       || gdbpy_initialize_instruction () < 0
diff --git a/gdb/testsuite/gdb.python/py-disasm.c b/gdb/testsuite/gdb.python/py-disasm.c
new file mode 100644
index 00000000000..ee0bb157f4d
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.c
@@ -0,0 +1,25 @@
+/* This test program is part of GDB, the GNU debugger.
+
+   Copyright 2021-2022 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+int
+main ()
+{
+  asm ("nop");
+  asm ("nop");	/* Break here.  */
+  asm ("nop");
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.exp b/gdb/testsuite/gdb.python/py-disasm.exp
new file mode 100644
index 00000000000..ea7847fc6df
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.exp
@@ -0,0 +1,150 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This file is part of the GDB testsuite.  It validates the Python
+# disassembler API.
+
+load_lib gdb-python.exp
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile} "debug"] } {
+    return -1
+}
+
+# Skip all tests if Python scripting is not enabled.
+if { [skip_python_tests] } { continue }
+
+if ![runto_main] then {
+    fail "can't run to main"
+    return 0
+}
+
+set pyfile [gdb_remote_download host ${srcdir}/${subdir}/${testfile}.py]
+
+gdb_test "source ${pyfile}" "Python script imported" \
+         "import python scripts"
+
+gdb_breakpoint [gdb_get_line_number "Break here."]
+gdb_continue_to_breakpoint "Break here."
+
+set curr_pc [get_valueof "/x" "\$pc" "*unknown*"]
+
+gdb_test_no_output "python current_pc = ${curr_pc}"
+
+# The current pc will be something like 0x1234 with no leading zeros.
+# However, in the disassembler output addresses are padded with zeros.
+# This substitution changes 0x1234 to 0x0*1234, which can then be used
+# as a regexp in the disassembler output matching.
+set curr_pc_pattern [string replace ${curr_pc} 0 1 "0x0*"]
+
+# Grab the name of the current architecture, this is used in the tests
+# patterns below.
+set curr_arch [get_python_valueof "gdb.selected_inferior().architecture().name()" "*unknown*"]
+
+# Helper proc that removes all registered disassemblers.
+proc py_remove_all_disassemblers {} {
+    gdb_test_no_output "python remove_all_python_disassemblers()"
+}
+
+# A list of test plans.  Each plan is a list of two elements, the
+# first element is the name of a class in py-disasm.py, this is a
+# disassembler class.  The second element is a pattern that should be
+# matched in the disassembler output.
+#
+# Each different disassembler tests some different feature of the
+# Python disassembler API.
+set addr_pattern "\r\n=> ${curr_pc_pattern} <\[^>\]+>:\\s+"
+set base_pattern "${addr_pattern}nop"
+set test_plans \
+    [list \
+	 [list "" "${base_pattern}\r\n.*"] \
+	 [list "GlobalNullDisassembler" "${base_pattern}\r\n.*"] \
+	 [list "GlobalPreInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalPostInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalReadDisassembler" "${base_pattern}\\s+## bytes =( $hex)+\r\n.*"] \
+	 [list "GlobalAddrDisassembler" "${base_pattern}\\s+## addr = ${curr_pc_pattern} <\[^>\]+>\r\n.*"] \
+	 [list "NonMemoryErrorEarlyDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: non-memory error instead of a result\r\nnop\r\n.*"] \
+	 [list "NonMemoryErrorLateDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: non-memory error after builtin disassembler\r\nnop\r\n.*"] \
+	 [list "MemoryErrorEarlyDisassembler" "${base_pattern}\\s+## AFTER ERROR\r\n.*"] \
+	 [list "MemoryErrorLateDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "RethrowMemoryErrorDisassembler" "${addr_pattern}Cannot access memory at address $hex"] \
+	 [list "FaultingMemorySourceDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "FailingMemorySourceDisassembler" "${addr_pattern}Python Exception <class 'gdb.GdbError'>: the memory source failed\r\n\r\nCannot access memory at address ${curr_pc_pattern}"]]
+
+# Now execute each test plan.
+foreach plan $test_plans {
+    set global_disassembler_name [lindex $plan 0]
+    set expected_pattern [lindex $plan 1]
+
+    with_test_prefix "global_disassembler=${global_disassembler_name}" {
+	# Remove all existing disassemblers.
+	py_remove_all_disassemblers
+
+	# If we have a disassembler to load, do it now.
+	if { $global_disassembler_name != "" } {
+	    gdb_test_no_output "python add_global_disassembler($global_disassembler_name)"
+	}
+
+	# Disassemble main, and check the disassembler output.
+	gdb_test "disassemble main" $expected_pattern
+    }
+}
+
+# Check that the architecture specific disassemblers can override the
+# global disassembler.
+#
+# First, register a global disassembler, and check it is in place.
+with_test_prefix "GLOBAL tagging disassembler" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"GLOBAL\"), None)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Now register an architecture specific disassembler, and check it
+# overrides the global disassembler.
+with_test_prefix "LOCAL tagging disassembler" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"LOCAL\"), \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = LOCAL\r\n.*"
+}
+
+# Now remove the architecture specific disassembler, and check that
+# the global disassembler kicks back in.
+with_test_prefix "GLOBAL tagging disassembler again" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(None, \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Check that a DisassembleInfo becomes invalid after the call into the
+# disassembler.
+with_test_prefix "DisassembleInfo becomes invalid" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python add_global_disassembler(GlobalCachingDisassembler)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## CACHED\r\n.*"
+    gdb_test "python GlobalCachingDisassembler.check()" "PASS"
+}
+
+# Test the memory source aspect of the builtin disassembler.
+with_test_prefix "memory source api" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(analyzing_disassembler)"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*"
+    gdb_test "python analyzing_disassembler.find_replacement_candidate()" \
+	"Replace from $hex to $hex with NOP"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*" \
+	"second disassembler pass"
+    gdb_test "python analyzing_disassembler.check()" \
+	"PASS"
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.py b/gdb/testsuite/gdb.python/py-disasm.py
new file mode 100644
index 00000000000..a05244dbb1b
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.py
@@ -0,0 +1,456 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import gdb
+import gdb.disassembler
+import struct
+import sys
+
+from gdb.disassembler import Disassembler, DisassemblerResult
+
+# A global, holds the program-counter address at which we should
+# perform the extra disassembly that this script provides.
+current_pc = None
+
+
+def remove_all_python_disassemblers():
+    for a in gdb.architecture_names():
+        gdb.disassembler.register_disassembler(None, a)
+    gdb.disassembler.register_disassembler(None, None)
+
+
+class TestDisassembler(Disassembler):
+    """A base class for disassemblers within this script to inherit from.
+    Implements the __call__ method and ensures we only do any
+    disassembly wrapping for the global CURRENT_PC."""
+
+    def __init__(self):
+        global current_pc
+
+        super(TestDisassembler, self).__init__("TestDisassembler")
+        if current_pc == None:
+            raise gdb.GdbError("no current_pc set")
+
+    def __call__(self, info):
+        global current_pc
+
+        if info.address != current_pc:
+            return None
+        return self.disassemble(info)
+
+    def disassemble(self, info):
+        raise NotImplementedError("override the disassemble method")
+
+
+class GlobalPreInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo before disassembly has occurred."""
+
+    def disassemble(self, info):
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalPostInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo after disassembly has occurred."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalReadDisassembler(TestDisassembler):
+    """Check the DisassembleInfo.read_memory method.  Calls the builtin
+    disassembler, then reads all of the bytes of this instruction, and
+    adds them as a comment to the disassembler output."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        len = result.length
+        str = ""
+        for o in range(len):
+            if str != "":
+                str += " "
+            v = bytes(info.read_memory(1, o))[0]
+            if sys.version_info[0] < 3:
+                v = struct.unpack("<B", v)
+            str += "0x%02x" % v
+        text = result.string + "\t## bytes = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalAddrDisassembler(TestDisassembler):
+    """Check the gdb.format_address method."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        arch = info.architecture
+        addr = info.address
+        program_space = info.progspace
+        str = gdb.format_address(addr, program_space, arch)
+        text = result.string + "\t## addr = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class NonMemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a non-memory error instead of performing any disassembly."""
+
+    def disassemble(self, info):
+        raise gdb.GdbError("non-memory error instead of a result")
+
+
+class NonMemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a non-memory error after calling the builtin disassembler."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        raise gdb.GdbError("non-memory error after builtin disassembler")
+
+
+class MemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a memory error, ignore the error and disassemble."""
+
+    def disassemble(self, info):
+        tag = "## FAIL"
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError:
+            tag = "## AFTER ERROR"
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t" + tag
+        return DisassemblerResult(result.length, text)
+
+
+class MemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a memory error after calling the builtin disassembler, but
+    before we return a result."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        # The following read will throw an error.
+        info.read_memory(1, -info.address + 2)
+        return DisassemblerResult(1, "BAD")
+
+
+class RethrowMemoryErrorDisassembler(TestDisassembler):
+    """Catch and rethrow a memory error."""
+
+    def disassemble(self, info):
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError as e:
+            raise gdb.MemoryError("cannot read code at address 0x2")
+        return DisassemblerResult(1, "BAD")
+
+
+class TaggingDisassembler(TestDisassembler):
+    """A simple disassembler that just tags the output."""
+
+    def __init__(self, tag):
+        super(TaggingDisassembler, self).__init__()
+        self._tag = tag
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## tag = %s" % self._tag
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalCachingDisassembler(TestDisassembler):
+    """A disassembler that caches the DisassembleInfo that is passed in. Once
+    the call into the disassembler is complete then the DisassembleInfo
+    becomes invalid, and any calls into it should trigger an
+    exception."""
+
+    # This is where we cache the DisassembleInfo object.
+    cached_insn_disas = None
+
+    def disassemble(self, info):
+        """Disassemble the instruction, add a CACHED comment to the output,
+        and cache the DisassembleInfo so that it is not garbage collected."""
+        GlobalCachingDisassembler.cached_insn_disas = info
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## CACHED"
+        return DisassemblerResult(result.length, text)
+
+    @staticmethod
+    def check():
+        """Check that all of the methods on the cached DisassembleInfo trigger an
+        exception."""
+        info = GlobalCachingDisassembler.cached_insn_disas
+        assert isinstance(info, gdb.disassembler.DisassembleInfo)
+        assert not info.is_valid()
+        try:
+            val = info.address
+            raise gdb.GdbError("DisassembleInfo.address is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError("DisassembleInfo.address raised an unexpected exception")
+
+        try:
+            val = info.architecture
+            raise gdb.GdbError("DisassembleInfo.architecture is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError(
+                "DisassembleInfo.architecture raised an unexpected exception"
+            )
+
+        try:
+            val = info.read_memory(1, 0)
+            raise gdb.GdbError("DisassembleInfo.read is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError("DisassembleInfo.read raised an unexpected exception")
+
+        print("PASS")
+
+
+class GlobalNullDisassembler(TestDisassembler):
+    """A disassembler that does not change the output at all."""
+
+    def disassemble(self, info):
+        pass
+
+
+class FaultingMemorySourceDisassembler(TestDisassembler):
+    """Throw a memory error from the memory source read_memory method."""
+
+    def disassemble(self, info):
+        return gdb.disassembler.builtin_disassemble(info, self)
+
+    def read_memory(self, len, offset):
+        # Throw a memory error with a specific address.  We don't
+        # expect this address to show up in the output though.
+        raise gdb.MemoryError(0x1234)
+
+
+class FailingMemorySourceDisassembler(TestDisassembler):
+    """Throw a non-memory error from the memory source read_memory method."""
+
+    def disassemble(self, info):
+        return gdb.disassembler.builtin_disassemble(info, self)
+
+    def read_memory(self, len, offset):
+        raise gdb.GdbError("the memory source failed")
+
+
+class AnalyzingDisassembler(Disassembler):
+    def __init__(self, name):
+        """Constructor."""
+        super(AnalyzingDisassembler, self).__init__(name)
+
+        # Details about the instructions found during the first disassembler
+        # pass.
+        self._pass_1_length = []
+        self._pass_1_insn = []
+        self._pass_1_address = []
+
+        # The start and end address for the instruction we will replace with
+        # one or more 'nop' instructions during pass two.
+        self._start = None
+        self._end = None
+
+        # The index in the _pass_1_* lists for where the nop instruction can
+        # be found, also, the buffer of bytes that make up a nop instruction.
+        self._nop_index = None
+        self._nop_bytes = None
+
+        # The DisassembleInfo object passed into __call__ as INFO.
+        self._info = None
+
+        # A flag that indicates if we are in the first or second pass of
+        # this disassembler test.
+        self._first_pass = True
+
+        # The disassembled instructions collected during the second pass.
+        self._pass_2_insn = []
+
+        # A copy of _pass_1_insn that has been modified to include the extra
+        # 'nop' instructions we plan to insert during the second pass.  This
+        # is then checked against _pass_2_insn after the second disassembler
+        # pass has completed.
+        self._check = []
+
+    def __call__(self, info):
+        """Called to perform the disassembly."""
+
+        # Record INFO, we'll need to refer to this in READ_MEMORY which is
+        # called back to by the builtin disassembler.
+        self._info = info
+        result = gdb.disassembler.builtin_disassemble(info, self)
+
+        # Record some informaiton about the first 'nop' instruction we find.
+        if self._nop_index is None and result.string == "nop":
+            self._nop_index = len(self._pass_1_length)
+            # The offset in the following read_memory call defaults to 0.
+            self._nop_bytes = info.read_memory(result.length)
+
+        # Record information about each instruction that is disassembled.
+        # This test is performed in two passes, and we need different
+        # information in each pass.
+        if self._first_pass:
+            self._pass_1_length.append(result.length)
+            self._pass_1_insn.append(result.string)
+            self._pass_1_address.append(info.address)
+        else:
+            self._pass_2_insn.append(result.string)
+
+        return result
+
+    def _read_replacement(self, length, offset):
+        """Return a slice of the buffer representing the replacement nop
+        instructions."""
+
+        assert self._nop_bytes is not None
+        rb = self._nop_bytes
+
+        # If this request is outside of a nop instruction then we don't know
+        # what to do, so just raise a memory error.
+        if offset >= len(rb) or (offset + length) > len(rb):
+            raise gdb.MemoryError("invalid length and offset combination")
+
+        # Return only the slice of the nop instruction as requested.
+        s = offset
+        e = offset + length
+        return rb[s:e]
+
+    def read_memory(self, len, offset):
+        """Callback used from the builtin disassembler to read the contents of
+        memory."""
+
+        info = self._info
+        assert info is not None
+
+        # If this request is within the region we are replacing with 'nop'
+        # instructions, then call the helper function to perform that
+        # replacement.
+        if self._start is not None:
+            assert self._end is not None
+            if info.address >= self._start and info.address < self._end:
+                return self._read_replacement(len, offset)
+
+        # Otherwise, we just forward this request to the default read memory
+        # implementation.
+        return info.read_memory(len, offset)
+
+    def find_replacement_candidate(self):
+        """Call this after the first disassembly pass.  This identifies a suitable
+        instruction to replace with 'nop' instruction(s)."""
+
+        if self._nop_index is None:
+            raise gdb.GdbError("no nop was found")
+
+        nop_idx = self._nop_index
+        nop_length = self._pass_1_length[nop_idx]
+
+        # First we look for an instruction that is larger than a nop
+        # instruction, but whose length is an exact multiple of the nop
+        # instruction's length.
+        replace_idx = None
+        for idx in range(len(self._pass_1_length)):
+            if (
+                idx > 0
+                and idx != nop_idx
+                and self._pass_1_insn[idx] != "nop"
+                and self._pass_1_length[idx] > self._pass_1_length[nop_idx]
+                and self._pass_1_length[idx] % self._pass_1_length[nop_idx] == 0
+            ):
+                replace_idx = idx
+                break
+
+        # If we still don't have a replacement candidate, then search again,
+        # this time looking for an instruciton that is the same length as a
+        # nop instruction.
+        if replace_idx is None:
+            for idx in range(len(self._pass_1_length)):
+                if (
+                    idx > 0
+                    and idx != nop_idx
+                    and self._pass_1_insn[idx] != "nop"
+                    and self._pass_1_length[idx] == self._pass_1_length[nop_idx]
+                ):
+                    replace_idx = idx
+                    break
+
+        # Weird, the nop instruction must be larger than every other
+        # instruction, or all instructions are 'nop'?
+        if replace_idx is None:
+            raise gdb.GdbError("can't find an instruction to replace")
+
+        # Record the instruction range that will be replaced with 'nop'
+        # instructions, and mark that we are now on the second pass.
+        self._start = self._pass_1_address[replace_idx]
+        self._end = self._pass_1_address[replace_idx] + self._pass_1_length[replace_idx]
+        self._first_pass = False
+        print("Replace from 0x%x to 0x%x with NOP" % (self._start, self._end))
+
+        # Finally, build the expected result.  Create the _check list, which
+        # is a copy of _pass_1_insn, but replace the instruction we
+        # identified above with a series of 'nop' instructions.
+        self._check = list(self._pass_1_insn)
+        nop_count = int(self._pass_1_length[replace_idx] / self._pass_1_length[nop_idx])
+        nops = ["nop"] * nop_count
+        self._check[replace_idx : (replace_idx + 1)] = nops
+
+    def check(self):
+        """Call this after the second disassembler pass to validate the output."""
+        if self._check != self._pass_2_insn:
+            print("APB, Check : %s" % self._check)
+            print("APB, Result: %s" % self._pass_2_insn)
+            raise gdb.GdbError("mismatch")
+        print("PASS")
+
+
+# Create a global instance of the AnalyzingDisassembler.  This isn't
+# registered as a disassembler yet though, that is done from the
+# py-diasm.exp later.
+analyzing_disassembler = AnalyzingDisassembler("AnalyzingDisassembler")
+
+
+def add_global_disassembler(dis_class):
+    """Create an instance of DIS_CLASS and register it as a global disassembler."""
+    dis = dis_class()
+    gdb.disassembler.register_disassembler(dis, None)
+
+
+# Start with all disassemblers removed.
+remove_all_python_disassemblers()
+
+print("Python script imported")
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv2 3/3] gdb/python: implement the print_insn extension language hook
  2022-03-23 22:41   ` [PATCHv2 3/3] gdb/python: implement the print_insn extension language hook Andrew Burgess
@ 2022-03-24  7:10     ` Eli Zaretskii
  2022-03-24 19:51       ` Andrew Burgess
  0 siblings, 1 reply; 80+ messages in thread
From: Eli Zaretskii @ 2022-03-24  7:10 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

> Date: Wed, 23 Mar 2022 22:41:41 +0000
> From: Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org>
> Cc: Andrew Burgess <andrew.burgess@embecosm.com>
> 
> From: Andrew Burgess <andrew.burgess@embecosm.com>
> 
> This commit extends the Python API to include disassembler support.

Thanks.

> +If this function returns @code{None} then this indicates to
> +@value{GDBN} that this sub-class doesn't wish to disassemble the
> +requested instruction, @value{GDBN} will then use its builtin
> +disassembler to perform the disassembly.

Suggest to split into 2 sentences and slightly rephrase, for
readability:

  If this function returns @code{None}, this indicates to @value{GDBN}
  that this sub-class doesn't wish to disassemble the requested
  instruction.  @value{GDBN} will then use its builtin disassembler to
  perform the disassembly.

> +@defun register_disassembler (disassembler, architecture)
> +The @var{disassembler} must be a sub-class of @code{Disassembler}.
> +
> +The optional @var{architecture} is either a string, or the value
> +@code{None}.  If it is a string, then it should be the name of an
> +architecture known to @value{GDBN}, as returned either from
> +@code{gdb.Architecture.name}
> +(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from
> +@code{gdb.architecture_names}
> +(@pxref{gdb_architecture_names,,gdb.architecture_names}).
> +
> +The @var{disassembler} will be installed for the architecture named by
> +@var{architecture}, or if @var{architecture} is @code{None}, then
> +@var{disassembler} will be installed as a global disassembler for use
> +by all architectures.
> +
> +@value{GDBN} only records a single disassembler for each architecture,
> +and a single global disassembler.  Calling
> +@code{register_disassembler} for an architecture, or for the global
> +disassembler, will replace any existing disassembler registered for
> +that @var{architecture} value.  The previous disassembler is returned.
> +
> +When @value{GDBN} is looking for a disassembler to use, @value{GDBN}
> +first looks for an architecture specific disassembler.  If none has
> +been registered then @value{GDBN} looks for a global disassembler (one
> +registered with @var{architecture} set to @code{None}).  Only one
> +disassembler is called to perform disassembly, so, if there is both an
> +architecture specific disassembler, and a global disassembler
> +registered, it is the architecture specific disassembler that will be
> +used.

This is a very important discussion of how a disassembler for each use
case is looked for.  I think it warrants an index entry, because
readers are likely to need to review this when they work with such
situation.  So I suggest to add

  @cindex disassembler in Python, global vs.@: specific
  @cindex search order for disassembler in Python
  @cindex look up of disassembler in Python

> +@value{GDBN} tracks the architecture specific, and global
> +disassemblers separately, so it doesn't matter in which order
> +disassemblers are created or registed, an architecture specific
                                        ^
Please use a semi-colon there instead of a comma, for easier
readability.

> +disassembler, if present, will always be used before a global
> +disassembler.                                 ^^^^^^

Is it really "before" or "in preference to"?

> +If the requested instruction disassembled successfully then an instance
> +of @code{DisassemblerResult} is returned.
> +
> +If the builtin disassembler fails then this function will raise a
> +@code{gdb.MemoryError} exception.

The first sentence uses passive tense for no good reason.  Suggest to
rephrase similarly to the second sentence.

> +The optional @var{memory_source} argument has the default value of
> +@code{None}, in which case, the builtin disassembler will read the
                             ^
This comma is redundant.

> +Alternatively, this function can raise a @code{gdb.MemoryError}
> +exception to indicate that the read failed, raising any other
> +exception type is an error.

The part starting with "raising any other" should be a separate
sentence.

> +It is important to understand that, even when this function raises a
> +@code{gdb.MemoryError}, it is the internal disassembler itself that
> +reports the memory error to @value{GDBN}.  The reason for this is that
> +the disassembler might probe memory to see if a byte is readable or
> +not, if the byte can't be read then the disassembler may choose not to
      ^
This comma should be a semi-colon.

> +report an error, but to instead disassemble the bytes that it does
                    ^^^^^^^^^^^^^^^^^^^^^^^^^^
I believe "but instead to disassemble" is better.

The documentation parts are okay with those nits fixed.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv2 3/3] gdb/python: implement the print_insn extension language hook
  2022-03-24  7:10     ` Eli Zaretskii
@ 2022-03-24 19:51       ` Andrew Burgess
  0 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-03-24 19:51 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb-patches

Eli Zaretskii via Gdb-patches <gdb-patches@sourceware.org> writes:

>> Date: Wed, 23 Mar 2022 22:41:41 +0000
>> From: Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org>
>> Cc: Andrew Burgess <andrew.burgess@embecosm.com>
>> 
>> From: Andrew Burgess <andrew.burgess@embecosm.com>
>> 
>> This commit extends the Python API to include disassembler support.
>
> Thanks.
>
>> +If this function returns @code{None} then this indicates to
>> +@value{GDBN} that this sub-class doesn't wish to disassemble the
>> +requested instruction, @value{GDBN} will then use its builtin
>> +disassembler to perform the disassembly.
>
> Suggest to split into 2 sentences and slightly rephrase, for
> readability:
>
>   If this function returns @code{None}, this indicates to @value{GDBN}
>   that this sub-class doesn't wish to disassemble the requested
>   instruction.  @value{GDBN} will then use its builtin disassembler to
>   perform the disassembly.
>
>> +@defun register_disassembler (disassembler, architecture)
>> +The @var{disassembler} must be a sub-class of @code{Disassembler}.
>> +
>> +The optional @var{architecture} is either a string, or the value
>> +@code{None}.  If it is a string, then it should be the name of an
>> +architecture known to @value{GDBN}, as returned either from
>> +@code{gdb.Architecture.name}
>> +(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from
>> +@code{gdb.architecture_names}
>> +(@pxref{gdb_architecture_names,,gdb.architecture_names}).
>> +
>> +The @var{disassembler} will be installed for the architecture named by
>> +@var{architecture}, or if @var{architecture} is @code{None}, then
>> +@var{disassembler} will be installed as a global disassembler for use
>> +by all architectures.
>> +
>> +@value{GDBN} only records a single disassembler for each architecture,
>> +and a single global disassembler.  Calling
>> +@code{register_disassembler} for an architecture, or for the global
>> +disassembler, will replace any existing disassembler registered for
>> +that @var{architecture} value.  The previous disassembler is returned.
>> +
>> +When @value{GDBN} is looking for a disassembler to use, @value{GDBN}
>> +first looks for an architecture specific disassembler.  If none has
>> +been registered then @value{GDBN} looks for a global disassembler (one
>> +registered with @var{architecture} set to @code{None}).  Only one
>> +disassembler is called to perform disassembly, so, if there is both an
>> +architecture specific disassembler, and a global disassembler
>> +registered, it is the architecture specific disassembler that will be
>> +used.
>
> This is a very important discussion of how a disassembler for each use
> case is looked for.  I think it warrants an index entry, because
> readers are likely to need to review this when they work with such
> situation.  So I suggest to add
>
>   @cindex disassembler in Python, global vs.@: specific
>   @cindex search order for disassembler in Python
>   @cindex look up of disassembler in Python
>
>> +@value{GDBN} tracks the architecture specific, and global
>> +disassemblers separately, so it doesn't matter in which order
>> +disassemblers are created or registed, an architecture specific
>                                         ^
> Please use a semi-colon there instead of a comma, for easier
> readability.
>
>> +disassembler, if present, will always be used before a global
>> +disassembler.                                 ^^^^^^
>
> Is it really "before" or "in preference to"?
>
>> +If the requested instruction disassembled successfully then an instance
>> +of @code{DisassemblerResult} is returned.
>> +
>> +If the builtin disassembler fails then this function will raise a
>> +@code{gdb.MemoryError} exception.
>
> The first sentence uses passive tense for no good reason.  Suggest to
> rephrase similarly to the second sentence.
>
>> +The optional @var{memory_source} argument has the default value of
>> +@code{None}, in which case, the builtin disassembler will read the
>                              ^
> This comma is redundant.
>
>> +Alternatively, this function can raise a @code{gdb.MemoryError}
>> +exception to indicate that the read failed, raising any other
>> +exception type is an error.
>
> The part starting with "raising any other" should be a separate
> sentence.
>
>> +It is important to understand that, even when this function raises a
>> +@code{gdb.MemoryError}, it is the internal disassembler itself that
>> +reports the memory error to @value{GDBN}.  The reason for this is that
>> +the disassembler might probe memory to see if a byte is readable or
>> +not, if the byte can't be read then the disassembler may choose not to
>       ^
> This comma should be a semi-colon.
>
>> +report an error, but to instead disassemble the bytes that it does
>                     ^^^^^^^^^^^^^^^^^^^^^^^^^^
> I believe "but instead to disassemble" is better.
>
> The documentation parts are okay with those nits fixed.

Thanks for all the great feedback.  I've addressed all your points in
the updated patch below.

Thanks,
Andrew

---

commit 812df206e8e9c798e8e6e279dfb7b24fcda19bff
Author: Andrew Burgess <andrew.burgess@embecosm.com>
Date:   Fri Sep 17 18:12:34 2021 +0100

    gdb/python: implement the print_insn extension language hook
    
    This commit extends the Python API to include disassembler support.
    
    The motivation for this commit was to provide an API by which the user
    could write Python scripts that would augment the output of the
    disassembler.
    
    To achieve this I have followed the model of the existing libopcodes
    disassembler, that is, instructions are disassembled one by one.  This
    does restrict the type of things that it is possible to do from a
    Python script, i.e. all additional output has to fit on a single line,
    but this was all I needed, and creating something more complex would,
    I think, require greater changes to how GDB's internal disassembler
    operates.
    
    The disassembler API is contained in the new gdb.disassembler module,
    which defines the following classes:
    
      DisassembleInfo
    
          Similar to libopcodes disassemble_info structure, has read-only
      properties: address, architecture, and progspace.  And has methods:
      read_memory, and is_valid.
    
          Each time GDB wants an instruction disassembled, an instance of
      this class is passed to a user written disassembler function, by
      reading the properties, and calling the methods (and other support
      methods in the gdb.disassembler module) the user can perform and
      return the disassembly.
    
      Disassembler
    
          This is a base-class which user written disassemblers should
      inherit from.  This base class just provides base implementations of
      __init__ and __call__ which the user written disassembler should
      override.
    
      DisassemblerResult
    
          This class can be used to hold the result of a call to the
      disassembler, it's really just a wrapper around a string (the text
      of the disassembled instruction) and a length (in bytes).  The user
      can return an instance of this class from Disassembler.__call__ to
      represent the newly disassembled instruction.
    
    The gdb.disassembler module also provides the following functions:
    
      register_disassembler
    
          This function registers an instance of a Disassembler sub-class
      as a disassembler, either for one specific architecture, or, as a
      global disassembler for all architectures.
    
      builtin_disassemble
    
          This provides access to GDB's builtin disassembler.  A common
      use case that I see is augmenting the existing disassembler output.
      The user code can call this function to have GDB disassemble the
      instruction in the normal way.  The user gets back a
      DisassemblerResult object, which they can then read in order to
      augment the disassembler output in any way they wish.
    
          This function also provides a mechanism to intercept the
      disassemblers reads of memory, thus the user can adjust what GDB
      sees when it is disassembling.
    
    The included documentation provides a more detailed description of the
    API.

diff --git a/gdb/Makefile.in b/gdb/Makefile.in
index aecab41eeb8..bbcf8c467dc 100644
--- a/gdb/Makefile.in
+++ b/gdb/Makefile.in
@@ -393,6 +393,7 @@ SUBDIR_PYTHON_SRCS = \
 	python/py-cmd.c \
 	python/py-connection.c \
 	python/py-continueevent.c \
+	python/py-disasm.c \
 	python/py-event.c \
 	python/py-evtregistry.c \
 	python/py-evts.c \
diff --git a/gdb/NEWS b/gdb/NEWS
index e10062752d0..723ca6d5fee 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -16,6 +16,40 @@
      This is the same format that GDB uses when printing address, symbol,
      and offset information from the disassembler.
 
+  ** New Python API for wrapping GDB's disassembler:
+
+     - gdb.disassembler.register_disassembler(DISASSEMBLER, ARCH).
+       DISASSEMBLER is a sub-class of gdb.disassembler.Disassembler.
+       ARCH is either None or a string containing a bfd architecture
+       name.  DISASSEMBLER is registered as a disassembler for
+       architecture ARCH, or for all architectures if ARCH is None.
+       The previous disassembler registered for ARCH is returned, this
+       can be None if no previous disassembler was registered.
+
+     - gdb.disassembler.Disassembler is the class from which all
+       disassemblers should inherit.  Its constructor takes a string,
+       a name for the disassembler, which is currently only used is
+       some debug output.  Sub-classes should override the __call__
+       method to perform disassembly, invoking __call__ on this base
+       class will raise an exception.
+
+     - gdb.disassembler.DisassembleInfo is the class used to describe
+       a single disassembly request from GDB.  An instance of this
+       class is passed to the __call__ method of
+       gdb.disassembler.Disassembler and has the following read-only
+       attributes: 'address', and 'architecture', as well as the
+       following method: 'read_memory'.
+
+     - gdb.disassembler.builtin_disassemble(INFO, MEMORY_SOURCE),
+       calls GDB's builtin disassembler on INFO, which is a
+       gdb.disassembler.DisassembleInfo object.  MEMORY_SOURCE is
+       optional, its default value is None.  If MEMORY_SOURCE is not
+       None then it must be an object that has a 'read_memory' method.
+
+     - gdb.disassembler.DisassemblerResult is a class that can be used
+       to wrap the result of a call to a Disassembler.  It has
+       read-only attributes 'length' and 'string'.
+
 *** Changes in GDB 12
 
 * DBX mode is deprecated, and will be removed in GDB 13
diff --git a/gdb/data-directory/Makefile.in b/gdb/data-directory/Makefile.in
index b606fc654b5..cf5226f3961 100644
--- a/gdb/data-directory/Makefile.in
+++ b/gdb/data-directory/Makefile.in
@@ -69,6 +69,7 @@ PYTHON_DIR = python
 PYTHON_INSTALL_DIR = $(DESTDIR)$(GDB_DATADIR)/$(PYTHON_DIR)
 PYTHON_FILE_LIST = \
 	gdb/__init__.py \
+	gdb/disassembler.py \
 	gdb/FrameDecorator.py \
 	gdb/FrameIterator.py \
 	gdb/frames.py \
diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
index 7c414b01d70..b30173dcfbb 100644
--- a/gdb/doc/python.texi
+++ b/gdb/doc/python.texi
@@ -222,6 +222,7 @@
 * Registers In Python::         Python representation of registers.
 * Connections In Python::	Python representation of connections.
 * TUI Windows In Python::       Implementing new TUI windows.
+* Disassembly In Python::       Instruction Disassembly In Python
 @end menu
 
 @node Basic Python
@@ -598,6 +599,7 @@
 related prompts are prohibited from being changed.
 @end defun
 
+@anchor{gdb_architecture_names}
 @defun gdb.architecture_names ()
 Return a list containing all of the architecture names that the
 current build of @value{GDBN} supports.  Each architecture name is a
@@ -3278,6 +3280,7 @@
 particular frame (@pxref{Frames In Python}).
 @end defun
 
+@anchor{gdbpy_inferior_read_memory}
 @findex Inferior.read_memory
 @defun Inferior.read_memory (address, length)
 Read @var{length} addressable memory units from the inferior, starting at
@@ -6558,6 +6561,251 @@
 values can be 1 (left), 2 (middle), or 3 (right).
 @end defun
 
+@node Disassembly In Python
+@cindex python instruction disassembly
+@subsubsection Instruction Disassembly In Python
+
+@value{GDBN}'s builtin disassembler can be extended, or even replaced,
+using the Python API.  The disassembler related features are contained
+within the @code{gdb.disassembler} module:
+
+@deftp {class} gdb.disassembler.DisassembleInfo
+Disassembly is driven by instances of this class.  Each time
+@value{GDBN} needs to disassemble an instruction, an instance of this
+class is created and passed to a registered disassembler.  The
+disassembler is then responsible for disassembling an instruction and
+returning a result.  This class has the following properties and
+methods:
+
+@defivar DisassembleInfo address
+An integer containing the address at which @value{GDBN} wishes to
+disassemble a single instruction.
+@end defivar
+
+@defivar DisassembleInfo architecture
+The @code{gdb.Architecture} (@pxref{Architectures In Python}) for
+which @value{GDBN} is currently disassembling.
+@end defivar
+
+@defivar DisassembleInfo progspace
+The @code{gdb.Progspace} (@pxref{Progspaces In Python,,Program Spaces
+In Python}) for which @value{GDBN} is currently disassembling.
+@end defivar
+
+@defmethod DisassembleInfo read_memory (length, offset)
+This method allows the disassembler to read the bytes of the
+instruction to be disassembled.  The method reads @var{length} bytes,
+starting at @var{offset} from
+@code{DisassembleInfo.address}.
+
+It is important that the disassembler read the instruction bytes using
+this method, rather than reading inferior memory directly, as in some
+cases @value{GDBN} disassembles from an internal buffer rather than
+directly from inferior memory.
+
+Returns a buffer object, which behaves much like an array or a string,
+just as @code{Inferior.read_memory} does
+(@pxref{gdbpy_inferior_read_memory,,Inferior.read_memory}).
+@end defmethod
+
+@defmethod DisassembleInfo is_valid ()
+Returns @code{True} if the @code{DisassembleInfo} object is valid,
+@code{False} if not.  A @code{DisassembleInfo} object will become
+invalid once the disassembly call for which the @code{DisassembleInfo}
+was created, has returned.  Calling other @code{DisassembleInfo}
+methods, or accessing @code{DisassembleInfo} properties, will raise a
+@code{RuntimeError} exception if it is invalid.
+@end defmethod
+@end deftp
+
+@deftp {class} Disassembler
+This is a base class from which all user implemented disassemblers
+must inherit.
+
+@defmethod Disassembler __init__ (name)
+The constructor takes @var{name}, a string, which should be a short
+name for this disassembler.  Currently, this name is only used in some
+debug output.
+@end defmethod
+
+@defmethod Disassembler __call__ (info)
+The @code{__call__} method must be overridden by sub-classes to
+perform disassembly.  Calling @code{__call__} on this base class will
+raise a @code{NotImplementedError} exception.
+
+The @var{info} argument is an instance of @code{DisassembleInfo}, and
+describes the instruction that @value{GDBN} wants disassembling.
+
+If this function returns @code{None}, this indicates to @value{GDBN}
+that this sub-class doesn't wish to disassemble the requested
+instruction.  @value{GDBN} will then use its builtin disassembler to
+perform the disassembly.
+
+Or, this function can return an object that represents the
+disassembled instruction.  The object must have the following two
+attributes:
+
+@defvar length
+The length of the disassembled instruction in bytes, which must be
+greater than zero.
+@end defvar
+
+@defvar string
+A non-empty string representing the disassembled instruction.
+@end defvar
+
+The @code{DisassemblerResult} type is defined as a possible class to
+represent disassembled instructions, but it is not required to use
+this type, so long as the required attributes are present.
+
+The @code{__call__} method can raise a @code{gdb.MemoryError}
+exception (@pxref{Exception Handling}) to indicate to @value{GDBN}
+that there was a problem accessing the required memory, this will then
+be displayed by @value{GDBN} within the disassembler output.
+
+Any other exception type raised by the @code{__call__} method is an
+error, @value{GDBN} will display the error and then use its builtin
+disassembler to disassemble the instruction instead.
+@end defmethod
+@end deftp
+
+@deftp {class} DisassemblerResult
+This class can be used to hold the result of calling
+@w{@code{Disassembler.__call__}}.  It is not required to use this
+type, any type with the required attributes will do.
+
+The required properties, which this class provides are:
+
+@defvar length
+The length of the disassembled instruction in bytes, which must be
+greater than zero.
+@end defvar
+
+@defvar string
+A non-empty string representing the disassembled instruction.
+@end defvar
+
+This class also provides a constructor:
+
+@defun DisassemblerResult.__init__ (@var{length}, @var{string})
+Initialise an instance of this class, @var{length} is the length of
+the disassembled instruction in bytes, which must be greater than
+zero, and @var{string} is a non-empty string that represents the
+disassembled instruction.
+@end defun
+@end deftp
+
+The following functions are also contained in the
+@code{gdb.disassembler} module:
+
+@defun register_disassembler (disassembler, architecture)
+The @var{disassembler} must be a sub-class of
+@code{gdb.disassembler.Disassembler}.
+
+The optional @var{architecture} is either a string, or the value
+@code{None}.  If it is a string, then it should be the name of an
+architecture known to @value{GDBN}, as returned either from
+@code{gdb.Architecture.name}
+(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from
+@code{gdb.architecture_names}
+(@pxref{gdb_architecture_names,,gdb.architecture_names}).
+
+The @var{disassembler} will be installed for the architecture named by
+@var{architecture}, or if @var{architecture} is @code{None}, then
+@var{disassembler} will be installed as a global disassembler for use
+by all architectures.
+
+@cindex disassembler in Python, global vs.@: specific
+@cindex search order for disassembler in Python
+@cindex look up of disassembler in Python
+
+@value{GDBN} only records a single disassembler for each architecture,
+and a single global disassembler.  Calling
+@code{register_disassembler} for an architecture, or for the global
+disassembler, will replace any existing disassembler registered for
+that @var{architecture} value.  The previous disassembler is returned.
+
+When @value{GDBN} is looking for a disassembler to use, @value{GDBN}
+first looks for an architecture specific disassembler.  If none has
+been registered then @value{GDBN} looks for a global disassembler (one
+registered with @var{architecture} set to @code{None}).  Only one
+disassembler is called to perform disassembly, so, if there is both an
+architecture specific disassembler, and a global disassembler
+registered, it is the architecture specific disassembler that will be
+used.
+
+@value{GDBN} tracks the architecture specific, and global
+disassemblers separately, so it doesn't matter in which order
+disassemblers are created or registered; an architecture specific
+disassembler, if present, will always be used in preference to a
+global disassembler.
+@end defun
+
+@defun builtin_disassemble (info, memory_source)
+This function calls back into @value{GDBN}'s builtin disassembler to
+disassemble the instruction identified by @var{info}, an instance of
+@code{DisassembleInfo}.
+
+If the builtin disassembler is successful then an instance of
+@code{DisassemblerResult} is returned.
+
+If the builtin disassembler fails then a @code{gdb.MemoryError}
+exception will be raised.
+
+The optional @var{memory_source} argument has the default value of
+@code{None}, in which case the builtin disassembler will read the
+instruction from memory in the normal way.
+
+If @var{memory_source} is not @code{None}, then it should be an
+instance of a class that implements the following method:
+
+@defmethod memory_source read_memory (length, offset)
+This method will be called by the builtin disassembler to fetch bytes
+of the instruction being disassembled.  @var{length} is the number of
+bytes to fetch, and @var{offset} is the offset from the address of the
+instruction being disassembled, this address is obtained from
+@code{DisassembleInfo.address}.
+
+This function should return a Python object that supports the buffer
+protocol, i.e.@: a string, an array, or the object returned from
+@code{DisassembleInfo.read_memory}.
+
+The length of the returned buffer @emph{must} be @var{length}
+otherwise a @code{ValueError} exception will be raised.
+
+Alternatively, this function can raise a @code{gdb.MemoryError}
+exception to indicate that the read failed.  Raising any other
+exception type is an error.
+
+It is important to understand that, even when this function raises a
+@code{gdb.MemoryError}, it is the internal disassembler itself that
+reports the memory error to @value{GDBN}.  The reason for this is that
+the disassembler might probe memory to see if a byte is readable or
+not; if the byte can't be read then the disassembler may choose not to
+report an error, but instead to disassemble the bytes that it does
+have available.
+@end defmethod
+@end defun
+
+Here is an example that registers a global disassembler.  The new
+disassembler invokes the builtin disassembler, and then adds a
+comment, @code{## Comment}, to each line of disassembly output:
+
+@smallexample
+class ExampleDisassembler(gdb.disassembler.Disassembler):
+    def __init__(self):
+        super(ExampleDisassembler, self).__init__("ExampleDisassembler")
+
+    def __call__(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        if result.string is not None:
+            length = result.length
+            text = result.string + "\t## Comment"
+            return gdb.disassembler.DisassemblerResult(length, text)
+
+gdb.disassembler.register_disassembler(ExampleDisassembler())
+@end smallexample
+
 @node Python Auto-loading
 @subsection Python Auto-loading
 @cindex Python auto-loading
diff --git a/gdb/python/lib/gdb/disassembler.py b/gdb/python/lib/gdb/disassembler.py
new file mode 100644
index 00000000000..19ec0ecf82f
--- /dev/null
+++ b/gdb/python/lib/gdb/disassembler.py
@@ -0,0 +1,109 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+"""Disassembler related module."""
+
+import gdb
+import _gdb.disassembler
+
+from _gdb.disassembler import *
+
+# Module global dictionary of gdb.disassembler.Disassembler objects.
+# The keys of this dictionary are bfd architecture names, or the
+# special value None.
+#
+# When a request to disassemble comes in we first lookup the bfd
+# architecture name from the gdbarch, if that name exists in this
+# dictionary then we use that Disassembler object.
+#
+# If there's no architecture specific disassembler then we look for
+# the key None in this dictionary, and if that key exists, we use that
+# disassembler.
+#
+# If none of the above checks found a suitable disassembler, then no
+# disassembly is performed in Python.
+_disassemblers_dict = {}
+
+
+class Disassembler(object):
+    """A base class from which all user implemented disassemblers must
+    inherit."""
+
+    def __init__(self, name):
+        """Constructor.  Takes a name, which should be a string, which can be
+        used to identify this disassembler in diagnostic messages."""
+        self.name = name
+
+    def __call__(self, info):
+        """A default implementation of __call__.  All sub-classes must
+        override this method.  Calling this default implementation will throw
+        a NotImplementedError exception."""
+        raise NotImplementedError("Disassembler.__call__")
+
+
+def register_disassembler(disassembler, architecture=None):
+    """Register a disassembler.  DISASSEMBLER is a sub-class of
+    gdb.disassembler.Disassembler.  ARCHITECTURE is either None or a
+    string, the name of an architecture known to GDB.
+
+    DISASSEMBLER is registered as a disassmbler for ARCHITECTURE, or
+    all architectures when ARCHITECTURE is None.
+
+    Returns the previous disassembler registered with this
+    ARCHITECTURE value.
+    """
+
+    if not isinstance(disassembler, Disassembler) and disassembler is not None:
+        raise TypeError("disassembler should sub-class gdb.disassembler.Disassembler")
+
+    old = None
+    if architecture in _disassemblers_dict:
+        old = _disassemblers_dict[architecture]
+        del _disassemblers_dict[architecture]
+    if disassembler is not None:
+        _disassemblers_dict[architecture] = disassembler
+
+    # Call the private _set_enabled function within the
+    # _gdb.disassembler module.  This function sets a global flag
+    # within GDB's C++ code that enables or dissables the Python
+    # disassembler functionality, this improves performance of the
+    # disassembler by avoiding unneeded calls into Python when we know
+    # that no disassemblers are registered.
+    _gdb.disassembler._set_enabled(len(_disassemblers_dict) > 0)
+    return old
+
+
+def _print_insn(info):
+    """This function is called by GDB when it wants to disassemble an
+    instruction.  INFO describes the instruction to be
+    disassembled."""
+
+    def lookup_disassembler(arch):
+        try:
+            name = arch.name()
+            if name is None:
+                return None
+            if name in _disassemblers_dict:
+                return _disassemblers_dict[name]
+            if None in _disassemblers_dict:
+                return _disassemblers_dict[None]
+            return None
+        except:
+            return None
+
+    disassembler = lookup_disassembler(info.architecture)
+    if disassembler is None:
+        return None
+    return disassembler(info)
diff --git a/gdb/python/py-disasm.c b/gdb/python/py-disasm.c
new file mode 100644
index 00000000000..9aa1b156023
--- /dev/null
+++ b/gdb/python/py-disasm.c
@@ -0,0 +1,970 @@
+/* Python interface to instruction disassembly.
+
+   Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "defs.h"
+#include "python-internal.h"
+#include "dis-asm.h"
+#include "arch-utils.h"
+#include "charset.h"
+#include "disasm.h"
+#include "progspace.h"
+
+/* Implement gdb.disassembler.DisassembleInfo type.  An object of this type
+   represents a single disassembler request from GDB.  */
+
+struct disasm_info_object {
+  PyObject_HEAD
+
+  /* The architecture in which we are disassembling.  */
+  struct gdbarch *gdbarch;
+
+  /* The program_space in which we are disassembling.  */
+  struct program_space *program_space;
+
+  /* Address of the instruction to disassemble.  */
+  bfd_vma address;
+
+  /* The disassemble_info passed from core GDB, this contains the
+     callbacks necessary to read the instruction from core GDB, and to
+     print the disassembled instruction.  */
+  disassemble_info *gdb_info;
+};
+
+extern PyTypeObject disasm_info_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_info_object");
+
+/* Implement gdb.disassembler.DisassemblerResult type, an object that holds
+   the result of calling the disassembler.  This is mostly the length of
+   the disassembled instruction (in bytes), and the string representing the
+   disassembled instruction.  */
+
+struct disasm_result_object {
+  PyObject_HEAD
+
+  /* The length of the disassembled instruction in bytes.  */
+  int length;
+
+  /* A buffer which, when allocated, holds the disassembled content of an
+     instruction.  */
+  string_file *content;
+};
+
+extern PyTypeObject disasm_result_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_result_object");
+
+/* When this is false we fast path out of gdbpy_print_insn, which should
+   keep the performance impact of the Python disassembler down.  This is
+   set to true from Python by calling gdb.disassembler._set_enabled() when
+   the user registers a disassembler.  */
+
+static bool python_print_insn_enabled = false;
+
+/* A sub-class of gdb_disassembler that holds a pointer to a Python
+   DisassembleInfo object.  A pointer to an instance of this class is
+   placed in the application_data field of the disassemble_info that is
+   used when we call gdbarch_print_insn.  */
+
+struct gdbpy_disassembler : public gdb_disassemble_info
+{
+  /* Constructor.  */
+  gdbpy_disassembler (disasm_info_object *obj, PyObject *memory_source);
+
+  /* Get the DisassembleInfo object pointer.  */
+  disasm_info_object *
+  py_disasm_info () const
+  {
+    return m_disasm_info_object;
+  }
+
+  /* Callbacks used by disassemble_info.  */
+  static void memory_error_func (int status, bfd_vma memaddr,
+				 struct disassemble_info *info);
+  static void print_address_func (bfd_vma addr,
+				  struct disassemble_info *info);
+  static int read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+			       unsigned int len,
+			       struct disassemble_info *info);
+
+  /* Return a reference to an optional that contains the address at which a
+     memory error occurred.  The optional will only have a value if a
+     memory error actually occurred.  */
+  const gdb::optional<CORE_ADDR> &memory_error_address () const
+  { return m_memory_error_address; }
+
+  /* Return the content of the disassembler as a string.  The contents are
+     moved out of the disassembler, so after this call the disassembler
+     contents have been reset back to empty.  */
+  std::string release ()
+  {
+    return m_string_file.release ();
+  }
+
+private:
+
+  /* Where the disassembler result is written.  */
+  string_file m_string_file;
+
+  /* The DisassembleInfo object we are disassembling for.  */
+  disasm_info_object *m_disasm_info_object;
+
+  /* When the user indicates that a memory error has occurred then the
+     address of the memory error is stored in here.  */
+  gdb::optional<CORE_ADDR> m_memory_error_address;
+
+  /* When the user calls the builtin_disassemble function, if they pass a
+     memory source object then a pointer to the object is placed in here,
+     otherwise, this field is nullptr.  */
+  PyObject *m_memory_source;
+};
+
+/* Return true if OBJ is still valid, otherwise, return false.  A valid OBJ
+   will have a non-nullptr gdb_info field.  */
+
+static bool
+disasm_info_object_is_valid (disasm_info_object *obj)
+{
+  return obj->gdb_info != nullptr;
+}
+
+/* Implement DisassembleInfo.is_valid(), really just a wrapper around the
+   disasm_info_object_is_valid function above.  */
+
+static PyObject *
+disasmpy_info_is_valid (PyObject *self, PyObject *args)
+{
+  disasm_info_object *disasm_obj = (disasm_info_object *) self;
+
+  if (disasm_info_object_is_valid (disasm_obj))
+    Py_RETURN_TRUE;
+
+  Py_RETURN_FALSE;
+}
+
+/* Set the Python exception to be a gdb.MemoryError object, with ADDRESS
+   as its payload.  */
+
+static void
+disasmpy_set_memory_error_for_address (CORE_ADDR address)
+{
+  PyObject *address_obj = gdb_py_object_from_longest (address).release ();
+  PyErr_SetObject (gdbpy_gdb_memory_error, address_obj);
+}
+
+
+/* Ensure that a gdb.disassembler.DisassembleInfo is valid.  */
+#define DISASMPY_DISASM_INFO_REQUIRE_VALID(Info)			\
+  do {									\
+    if (!disasm_info_object_is_valid (Info))					\
+      {									\
+	PyErr_SetString (PyExc_RuntimeError,				\
+			 _("DisassembleInfo is no longer valid."));	\
+	return nullptr;							\
+      }									\
+  } while (0)
+
+/* Implement gdb.disassembler.builtin_disassemble().  Calls back into GDB's
+   builtin disassembler.  The first argument is a DisassembleInfo object
+   describing what to disassemble.  The second argument is optional and
+   provides a mechanism to modify the memory contents that the builtin
+   disassembler will actually disassemble.
+
+   Returns an instance of gdb.disassembler.DisassemblerResult, an object
+   that wraps a disassembled instruction, or it raises a
+   gdb.MemoryError.  */
+
+static PyObject *
+disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *info_obj, *memory_source_obj = nullptr;
+  static const char *keywords[] = { "info", "memory_source", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
+					&disasm_info_object_type, &info_obj,
+					&memory_source_obj))
+    return nullptr;
+
+  disasm_info_object *disasm_info = (disasm_info_object *) info_obj;
+  if (!disasm_info_object_is_valid (disasm_info))
+    {
+      PyErr_SetString (PyExc_RuntimeError,
+		       _("DisassembleInfo is no longer valid."));
+      return nullptr;
+    }
+
+  /* A memory source is any object that provides the 'read_memory'
+     callback.  At this point we only check for the existence of a
+     'read_memory' attribute, if this isn't callable then we'll throw an
+     exception from within gdbpy_disassembler::read_memory_func.  */
+  if (memory_source_obj != nullptr)
+    {
+      if (!PyObject_HasAttrString (memory_source_obj, "read_memory"))
+	{
+	  PyErr_SetString (PyExc_TypeError,
+			   _("memory_source doesn't have a read_memory method"));
+	  return nullptr;
+	}
+    }
+
+  /* Where the result will be written.  */
+  gdbpy_disassembler disassembler (disasm_info, memory_source_obj);
+
+  /* Now actually perform the disassembly.  */
+  int length
+    = gdbarch_print_insn (disasm_info->gdbarch, disasm_info->address,
+			  disassembler.disasm_info ());
+
+  if (length == -1)
+    {
+
+      /* In an ideal world, every disassembler should always call the
+	 memory error function before returning a status of -1 as the only
+	 error a disassembler should encounter is a failure to read
+	 memory.  Unfortunately, there are some disassemblers who don't
+	 follow this rule, and will return -1 without calling the memory
+	 error function.
+
+	 To make the Python API simpler, we just classify everything as a
+	 memory error, but the message has to be modified for the case
+	 where the disassembler didn't call the memory error function.  */
+      if (disassembler.memory_error_address ().has_value ())
+	{
+	  CORE_ADDR addr = *disassembler.memory_error_address ();
+	  disasmpy_set_memory_error_for_address (addr);
+	}
+      else
+	PyErr_Format (gdbpy_gdb_memory_error, "unknown disassembly error");
+      return nullptr;
+    }
+
+  /* Instructions are either non-zero in length, or we got an error,
+     indicated by a length of -1, which we handled above.  */
+  gdb_assert (length > 0);
+
+  /* We should not have seen a memory error in this case.  */
+  gdb_assert (!disassembler.memory_error_address ().has_value ());
+
+  /* Create an object to represent the result of the disassembler.  */
+  gdbpy_ref<disasm_result_object> res
+    (PyObject_New (disasm_result_object, &disasm_result_object_type));
+  res->length = length;
+  res->content = new string_file;
+  *(res->content) = disassembler.release ();
+
+  return reinterpret_cast<PyObject *> (res.release ());
+}
+
+/* Implement gdb.set_enabled function.  Takes a boolean parameter, and
+   sets whether GDB should enter the Python disassembler code or not.
+
+   This is called from within the Python code when a new disassembler is
+   registered.  When no disassemblers are registered the global C++ flag
+   is set to false, and GDB never even enters the Python environment to
+   check for a disassembler.
+
+   When the user registers a new Python disassembler, the global C++ flag
+   is set to true, and now GDB will enter the Python environment to check
+   if there's a disassembler registered for the current architecture.  */
+
+static PyObject *
+disasmpy_set_enabled (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *newstate;
+  static const char *keywords[] = { "state", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O", keywords,
+					&newstate))
+    return nullptr;
+
+  if (!PyBool_Check (newstate))
+    {
+      PyErr_SetString (PyExc_TypeError,
+		       _("The value passed to `_set_enabled' must be a boolean."));
+      return nullptr;
+    }
+
+  python_print_insn_enabled = PyObject_IsTrue (newstate);
+  Py_RETURN_NONE;
+}
+
+/* Implement DisassembleInfo.read_memory(LENGTH, OFFSET).  Read LENGTH
+   bytes at OFFSET from the start of the instruction currently being
+   disassembled, and return a memory buffer containing the bytes.
+
+   OFFSET defaults to zero if it is not provided.  LENGTH is required.  If
+   the read fails then this will raise a gdb.MemoryError exception.  */
+
+static PyObject *
+disasmpy_info_read_memory (PyObject *self, PyObject *args, PyObject *kw)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+
+  LONGEST length, offset = 0;
+  gdb::unique_xmalloc_ptr<gdb_byte> buffer;
+  static const char *keywords[] = { "length", "offset", nullptr };
+
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "L|L", keywords,
+					&length, &offset))
+    return nullptr;
+
+  /* The apparent address from which we are reading memory.  Note that in
+     some cases GDB actually disassembles instructions from a buffer, so
+     we might not actually be reading this information directly from the
+     inferior memory.  This is all hidden behind the read_memory_func API
+     within the disassemble_info structure.  */
+  CORE_ADDR address = obj->address + offset;
+
+  /* Setup a buffer to hold the result.  */
+  buffer.reset ((gdb_byte *) xmalloc (length));
+
+  /* Read content into BUFFER.  If the read fails then raise a memory
+     error, otherwise, convert BUFFER to a Python memory buffer, and return
+     it to the user.  */
+  disassemble_info *info = obj->gdb_info;
+  if (info->read_memory_func ((bfd_vma) address, buffer.get (),
+			      (unsigned int) length, info) != 0)
+    {
+      disasmpy_set_memory_error_for_address (address);
+      return nullptr;
+    }
+  return gdbpy_buffer_to_membuf (std::move (buffer), address, length);
+}
+
+/* Implement DisassembleInfo.address attribute, return the address at which
+   GDB would like an instruction disassembled.  */
+
+static PyObject *
+disasmpy_info_address (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdb_py_object_from_longest (obj->address).release ();
+}
+
+/* Implement DisassembleInfo.architecture attribute.  Return the
+   gdb.Architecture in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_architecture (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdbarch_to_arch_object (obj->gdbarch);
+}
+
+/* Implement DisassembleInfo.progspace attribute.  Return the
+   gdb.Progspace in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_progspace (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return pspace_to_pspace_object (obj->program_space).release ();
+}
+
+/* This implements the disassemble_info read_memory_func callback.  This
+   will either call the standard read memory function, or, if the user has
+   supplied a memory source (see disasmpy_builtin_disassemble) then this
+   will call back into Python to obtain the memory contents.
+
+   Read LEN bytes from MEMADDR and place them into BUFF.  Return 0 on
+   success (in which case BUFF has been filled), or -1 on error, in which
+   case the contents of BUFF are undefined.  */
+
+int
+gdbpy_disassembler::read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+				      unsigned int len,
+				      struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  disasm_info_object *obj = dis->py_disasm_info ();
+  PyObject *memory_source = dis->m_memory_source;
+
+  /* The simple case, the user didn't pass a separate memory source, so we
+     just delegate to the standard disassemble_info read_memory_func,
+     passing in the original disassemble_info object, which core GDB might
+     require in order to read the instruction bytes (when reading the
+     instruction from a buffer).  */
+  if (memory_source == nullptr)
+    return obj->gdb_info->read_memory_func (memaddr, buff, len, obj->gdb_info);
+
+  /* The user provided a separate memory source, we need to call the
+     read_memory method on the memory source and use the buffer it returns
+     as the bytes of memory.  */
+  LONGEST offset = (LONGEST) memaddr - (LONGEST) obj->address;
+  gdbpy_ref<> result_obj (PyObject_CallMethod (memory_source, "read_memory",
+					       "KL", len, offset));
+  if (result_obj == nullptr)
+    {
+      /* If we got a gdb.MemoryError then we ignore this and just report
+	 that the read failed to the caller.  The caller is then
+	 responsible for calling the memory_error_func if it wants to.
+	 Remember, the disassembler might just be probing to see if these
+	 bytes can be read, if we automatically call the memory error
+	 function, we can end up registering an error prematurely.  */
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	PyErr_Clear ();
+      else
+	gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Convert the result to a buffer.  */
+  Py_buffer py_buff;
+  if (!PyObject_CheckBuffer (result_obj.get ())
+      || PyObject_GetBuffer (result_obj.get(), &py_buff, PyBUF_CONTIG_RO) < 0)
+    {
+      PyErr_Format (PyExc_TypeError,
+		    _("Result from read_memory is not a buffer"));
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Wrap PY_BUFF so that it is cleaned up correctly at the end of this
+     scope.  */
+  Py_buffer_up buffer_up (&py_buff);
+
+  /* Validate that the buffer is the correct length.  */
+  if (py_buff.len != len)
+    {
+      PyErr_Format (PyExc_ValueError,
+		    _("Result from read_memory is incorrectly sized buffer"));
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Copy the data out of the Python buffer and return succsess.*/
+  const gdb_byte *buffer = (const gdb_byte *) py_buff.buf;
+  memcpy (buff, buffer, len);
+  return 0;
+}
+
+/* Implement DisassemblerResult.length attribute, return the length of the
+   disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_length (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  return gdb_py_object_from_longest (obj->length).release ();
+}
+
+/* Implement DisassemblerResult.string attribute, return the content string
+   of the disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_string (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+
+  gdb_assert (obj->content != nullptr);
+  gdb_assert (strlen (obj->content->c_str ()) > 0);
+  gdb_assert (obj->length > 0);
+  return PyUnicode_Decode (obj->content->c_str (),
+			   obj->content->size (),
+			   host_charset (), nullptr);
+}
+
+/* Implement DisassemblerResult.__init__.  Takes two arguments, an
+   integer, the length in bytes of the disassembled instruction, and a
+   string, the disassembled content of the instruction.  */
+
+static int
+disasmpy_result_init (PyObject *self, PyObject *args, PyObject *kwargs)
+{
+  static const char *keywords[] = { "length", "string", NULL };
+  int length;
+  const char *string;
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kwargs, "is", keywords,
+					&length, &string))
+    return -1;
+
+  if (length <= 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("Length must be greater than 0."));
+      return -1;
+    }
+
+  if (strlen (string) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("String must not be empty."));
+      return -1;
+    }
+
+  disasm_result_object *obj = (disasm_result_object *) self;
+  if (obj->content == nullptr)
+    obj->content = new string_file;
+  obj->length = length;
+  obj->content->write (string, strlen (string));
+  return 0;
+}
+
+/* Implement memory_error_func callback for disassemble_info.  Extract the
+   underlying DisassembleInfo Python object, and set a memory error on
+   it.  */
+
+void
+gdbpy_disassembler::memory_error_func (int status, bfd_vma memaddr,
+				       struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  dis->m_memory_error_address.emplace (memaddr);
+}
+
+/* Wrapper of print_address.  */
+
+void
+gdbpy_disassembler::print_address_func (bfd_vma addr,
+					struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  print_address (dis->arch (), addr, (struct ui_file *) info->stream);
+}
+
+/* constructor.  */
+
+gdbpy_disassembler::gdbpy_disassembler (disasm_info_object *obj,
+					PyObject *memory_source)
+  : gdb_disassemble_info (obj->gdbarch, &m_string_file, read_memory_func,
+			  memory_error_func, print_address_func,
+			  fprintf_func),
+    m_disasm_info_object (obj),
+    m_memory_source (memory_source)
+{ /* Nothing.  */ }
+
+/* A wrapper around a reference to a Python DisassembleInfo object, which
+   ensures that the object is marked as invalid when we leave the enclosing
+   scope.
+
+   Each DisassembleInfo is created in gdbpy_print_insn, and is done with by
+   the time that function returns.  However, there's nothing to stop a user
+   caching a reference to the DisassembleInfo, and thus keeping the object
+   around.
+
+   We therefore have the notion of a DisassembleInfo becoming invalid, this
+   happens when gdbpy_print_insn returns.  This class is responsible for
+   marking the DisassembleInfo as invalid in its destructor.  */
+
+struct scoped_disasm_info_object
+{
+  /* Constructor.  */
+  scoped_disasm_info_object (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+			     disassemble_info *info)
+    : m_disasm_info (allocate_disasm_info_object ())
+  {
+    m_disasm_info->address = memaddr;
+    m_disasm_info->gdb_info = info;
+    m_disasm_info->gdbarch = gdbarch;
+    m_disasm_info->program_space = current_program_space;
+  }
+
+  /* Upon destruction mark m_diasm_info as invalid.  */
+  ~scoped_disasm_info_object ()
+  {
+    m_disasm_info->gdb_info = nullptr;
+  }
+
+  /* Return a pointer to the underlying disasm_info_object instance.  */
+  disasm_info_object *
+  get () const
+  {
+    return m_disasm_info.get ();
+  }
+
+private:
+
+  /* Wrapper around the call to PyObject_New, this wrapper function can be
+     called from the constructor initialization list, while PyObject_New, a
+     macro, can't.  */
+  static disasm_info_object *
+  allocate_disasm_info_object ()
+  {
+    return (disasm_info_object *) PyObject_New (disasm_info_object,
+						&disasm_info_object_type);
+  }
+
+  /* A reference to a gdb.disassembler.DisassembleInfo object.  When this
+     containing instance goes out of scope this reference is released,
+     however, the user might be holding other references to the
+     DisassembleInfo object in Python code, so the underlying object might
+     not be deleted.  */
+  gdbpy_ref<disasm_info_object> m_disasm_info;
+};
+
+/* See python-internal.h.  */
+
+gdb::optional<int>
+gdbpy_print_insn (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+		  disassemble_info *info)
+{
+  /* Early exit case.  This must be done as early as possible, and
+     definitely before we enter Python environment.  The
+     python_print_insn_enabled flag is set (from Python) only when the user
+     has installed one (or more) Python disassemblers.  So in the common
+     case (no custom disassembler installed) this flag will be false,
+     allowing for a quick return.  */
+  if (!gdb_python_initialized || !python_print_insn_enabled)
+    return {};
+
+  gdbpy_enter enter_py (get_current_arch (), current_language);
+
+  /* The attribute we are going to lookup that provides the print_insn
+     functionality.  */
+  static const char *callback_name = "_print_insn";
+
+  /* Grab a reference to the gdb.disassembler module, and check it has the
+     attribute that we need.  */
+  gdbpy_ref<> gdb_python_disassembler_module
+    (PyImport_ImportModule ("gdb.disassembler"));
+  if (gdb_python_disassembler_module == nullptr
+      || !PyObject_HasAttrString (gdb_python_disassembler_module.get (),
+				  callback_name))
+    return {};
+
+  /* Now grab the callback attribute from the module.  */
+  gdbpy_ref<> hook
+    (PyObject_GetAttrString (gdb_python_disassembler_module.get (),
+			     callback_name));
+  if (hook == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Create the new DisassembleInfo object we will pass into Python.  This
+     object will be marked as invalid when we leave this scope.  */
+  scoped_disasm_info_object scoped_disasm_info (gdbarch, memaddr, info);
+  disasm_info_object *disasm_info = scoped_disasm_info.get ();
+
+  /* Call into the registered disassembler to (possibly) perform the
+     disassembly.  */
+  PyObject *insn_disas_obj = (PyObject *) disasm_info;
+  gdbpy_ref<> result (PyObject_CallFunctionObjArgs (hook.get (),
+						    insn_disas_obj,
+						    nullptr));
+
+  if (result == nullptr)
+    {
+      /* The call into Python code resulted in an exception.  If this was a
+	 gdb.MemoryError, then we can figure out an address and call the
+	 disassemble_info::memory_error_func to report the error back to
+	 core GDB.  Any other exception type we assume means a bug in the
+	 user's code, and print stack.  */
+
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	{
+	  /* A gdb.MemoryError might have an address attribute which
+	     contains the address at which the memory error occurred.  If
+	     this is the case then use this address, otherwise, fallback to
+	     just using the address of the instruction we were asked to
+	     disassemble.  */
+	  PyObject *error_type, *error_value, *error_traceback;
+	  CORE_ADDR addr;
+
+	  PyErr_Fetch (&error_type, &error_value, &error_traceback);
+
+	  if (error_value != nullptr
+	      && PyObject_HasAttrString (error_value, "address"))
+	    {
+	      PyObject *addr_obj = PyObject_GetAttrString (error_value,
+							   "address");
+	      if (get_addr_from_python (addr_obj, &addr) < 0)
+		addr = disasm_info->address;
+	    }
+	  else
+	    addr = disasm_info->address;
+
+	  PyErr_Clear ();
+	  info->memory_error_func (-1, addr, info);
+	  return gdb::optional<int> (-1);
+	}
+      else
+	{
+	  /* Anything that is not gdb.MemoryError.  */
+	  gdbpy_print_stack ();
+	  return {};
+	}
+    }
+  else if (result == Py_None)
+    {
+      /* A return value of None indicates that the Python code could not,
+	 or doesn't want to, disassemble this instruction.  Just return an
+	 empty result and core GDB will try to disassemble this for us.  */
+      return {};
+    }
+
+  /* The call into Python neither raised an exception, or returned None.
+     Check to see if the result looks valid.  */
+  gdbpy_ref<> length_obj (PyObject_GetAttrString (result.get (), "length"));
+  if (length_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  gdbpy_ref<> string_obj (PyObject_GetAttrString (result.get (), "string"));
+  if (string_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+  if (!gdbpy_is_string (string_obj.get ()))
+    {
+      PyErr_SetString (PyExc_TypeError, _("string attribute is not a string."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  gdb::unique_xmalloc_ptr<char> string
+    = gdbpy_obj_to_string (string_obj.get ());
+  if (string == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  long length;
+  if (!gdb_py_int_as_long (length_obj.get (), &length))
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  long max_insn_length = (gdbarch_max_insn_length_p (gdbarch) ?
+			  gdbarch_max_insn_length (gdbarch) : INT_MAX);
+  if (length <= 0 || length > max_insn_length)
+    {
+      PyErr_SetString (PyExc_ValueError, _("Invalid length attribute."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  if (strlen (string.get ()) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError, _("string attribute must not be empty."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Print the disassembled instruction back to core GDB, and return the
+     length of the disassembled instruction.  */
+  info->fprintf_func (info->stream, "%s", string.get ());
+  return gdb::optional<int> (length);
+}
+
+/* The tp_dealloc callback for the DisassemblerResult type.  Takes care of
+   deallocating the content buffer.  */
+
+static void
+disasmpy_dealloc_result (PyObject *self)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  delete obj->content;
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* The get/set attributes of the gdb.disassembler.DisassembleInfo type.  */
+
+static gdb_PyGetSetDef disasm_info_object_getset[] = {
+  { "address", disasmpy_info_address, nullptr,
+    "Start address of the instruction to disassemble.", nullptr },
+  { "architecture", disasmpy_info_architecture, nullptr,
+    "Architecture to disassemble in", nullptr },
+  { "progspace", disasmpy_info_progspace, nullptr,
+    "Program space to disassemble in", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* The methods of the gdb.disassembler.DisassembleInfo type.  */
+
+static PyMethodDef disasm_info_object_methods[] = {
+  { "read_memory", (PyCFunction) disasmpy_info_read_memory,
+    METH_VARARGS | METH_KEYWORDS,
+    "read_memory (LEN, OFFSET = 0) -> Octets[]\n\
+Read LEN octets for the instruction to disassemble." },
+  { "is_valid", disasmpy_info_is_valid, METH_NOARGS,
+    "is_valid () -> Boolean.\n\
+Return true if this DisassembleInfo is valid, false if not." },
+  {nullptr}  /* Sentinel */
+};
+
+/* The get/set attributes of the gdb.disassembler.DisassemblerResult type.  */
+
+static gdb_PyGetSetDef disasm_result_object_getset[] = {
+  { "length", disasmpy_result_length, nullptr,
+    "Length of the disassembled instruction.", nullptr },
+  { "string", disasmpy_result_string, nullptr,
+    "String representing the disassembled instruction.", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* These are the methods we add into the _gdb.disassembler module, which
+   are then imported into the gdb.disassembler module.  These are global
+   functions that support performing disassembly.  */
+
+PyMethodDef python_disassembler_methods[] =
+{
+  { "builtin_disassemble", (PyCFunction) disasmpy_builtin_disassemble,
+    METH_VARARGS | METH_KEYWORDS,
+    "builtin_disassemble (INFO, MEMORY_SOURCE = None) -> None\n\
+Disassemble using GDB's builtin disassembler.  INFO is an instance of\n\
+gdb.disassembler.DisassembleInfo.  The MEMORY_SOURCE, if not None, should\n\
+be an object with the read_memory method." },
+  { "_set_enabled", (PyCFunction) disasmpy_set_enabled,
+    METH_VARARGS | METH_KEYWORDS,
+    "_set_enabled (STATE) -> None\n\
+Set whether GDB should call into the Python _print_insn code or not." },
+  {nullptr, nullptr, 0, nullptr}
+};
+
+/* Structure to define the _gdb.disassembler module.  */
+
+static struct PyModuleDef python_disassembler_module_def =
+{
+  PyModuleDef_HEAD_INIT,
+  "_gdb.disassembler",
+  nullptr,
+  -1,
+  python_disassembler_methods,
+  nullptr,
+  nullptr,
+  nullptr,
+  nullptr
+};
+
+/* Called to initialize the Python structures in this file.  */
+
+int
+gdbpy_initialize_disasm
+(void)
+{
+  /* Create the _gdb.disassembler module, and add it to the _gdb module.  */
+
+  PyObject *gdb_disassembler_module;
+  gdb_disassembler_module = PyModule_Create (&python_disassembler_module_def);
+  if (gdb_disassembler_module == nullptr)
+    return -1;
+  PyModule_AddObject(gdb_module, "disassembler", gdb_disassembler_module);
+
+  /* This is needed so that 'import _gdb.disassembler' will work.  */
+  PyObject *dict = PyImport_GetModuleDict ();
+  PyDict_SetItemString (dict, "_gdb.disassembler", gdb_disassembler_module);
+
+  /* Having the tp_new field as nullptr means that this class can't be
+     created from user code.  The only way they can be created is from
+     within GDB, and then they are passed into user code.  */
+  gdb_assert (disasm_info_object_type.tp_new == nullptr);
+  if (PyType_Ready (&disasm_info_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassembleInfo",
+			      (PyObject *) &disasm_info_object_type) < 0)
+    return -1;
+
+  disasm_result_object_type.tp_new = PyType_GenericNew;
+  if (PyType_Ready (&disasm_result_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassemblerResult",
+			      (PyObject *) &disasm_result_object_type) < 0)
+    return -1;
+
+  return 0;
+}
+
+/* Describe the gdb.disassembler.DisassembleInfo type.  */
+
+PyTypeObject disasm_info_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassembleInfo",		/*tp_name*/
+  sizeof (disasm_info_object),			/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  0,						/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT,				/*tp_flags*/
+  "GDB instruction disassembler object",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  disasm_info_object_methods,			/* tp_methods */
+  0,						/* tp_members */
+  disasm_info_object_getset			/* tp_getset */
+};
+
+/* Describe the gdb.disassembler.DisassemblerResult type.  */
+
+PyTypeObject disasm_result_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassemblerResult",	/*tp_name*/
+  sizeof (disasm_result_object),		/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  disasmpy_dealloc_result,			/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,	/*tp_flags*/
+  "GDB object, representing a disassembler result",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  0,						/* tp_methods */
+  0,						/* tp_members */
+  disasm_result_object_getset,			/* tp_getset */
+  0,						/* tp_base */
+  0,						/* tp_dict */
+  0,						/* tp_descr_get */
+  0,						/* tp_descr_set */
+  0,						/* tp_dictoffset */
+  disasmpy_result_init,				/* tp_init */
+  0,						/* tp_alloc */
+};
diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
index d947b96033b..ed5894c1c3d 100644
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@@ -540,6 +540,8 @@ int gdbpy_initialize_connection ()
 int gdbpy_initialize_micommands (void)
   CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 void gdbpy_finalize_micommands ();
+int gdbpy_initialize_disasm ()
+  CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 
 /* A wrapper for PyErr_Fetch that handles reference counting for the
    caller.  */
@@ -822,4 +824,18 @@ extern bool gdbpy_is_architecture (PyObject *obj);
 
 extern bool gdbpy_is_progspace (PyObject *obj);
 
+/* Implement the 'print_insn' hook for Python.  Disassemble an instruction
+   whose address is ADDRESS for architecture GDBARCH.  The bytes of the
+   instruction should be read with INFO->read_memory_func as the
+   instruction being disassembled might actually be in a buffer.
+
+   Used INFO->fprintf_func to print the results of the disassembly, and
+   return the length of the instruction in octets.
+
+   If no instruction can be disassembled then return an empty value.  */
+
+extern gdb::optional<int> gdbpy_print_insn (struct gdbarch *gdbarch,
+					    CORE_ADDR address,
+					    disassemble_info *info);
+
 #endif /* PYTHON_PYTHON_INTERNAL_H */
diff --git a/gdb/python/python.c b/gdb/python/python.c
index df794dcd63a..84e4d64473b 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -167,7 +167,7 @@ static const struct extension_language_ops python_extension_ops =
 
   gdbpy_colorize_disasm,
 
-  NULL, /* gdbpy_print_insn, */
+  gdbpy_print_insn,
 };
 
 #endif /* HAVE_PYTHON */
@@ -2045,6 +2045,7 @@ do_start_initialization ()
 
   if (gdbpy_initialize_auto_load () < 0
       || gdbpy_initialize_values () < 0
+      || gdbpy_initialize_disasm () < 0
       || gdbpy_initialize_frames () < 0
       || gdbpy_initialize_commands () < 0
       || gdbpy_initialize_instruction () < 0
diff --git a/gdb/testsuite/gdb.python/py-disasm.c b/gdb/testsuite/gdb.python/py-disasm.c
new file mode 100644
index 00000000000..ee0bb157f4d
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.c
@@ -0,0 +1,25 @@
+/* This test program is part of GDB, the GNU debugger.
+
+   Copyright 2021-2022 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+int
+main ()
+{
+  asm ("nop");
+  asm ("nop");	/* Break here.  */
+  asm ("nop");
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.exp b/gdb/testsuite/gdb.python/py-disasm.exp
new file mode 100644
index 00000000000..ea7847fc6df
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.exp
@@ -0,0 +1,150 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This file is part of the GDB testsuite.  It validates the Python
+# disassembler API.
+
+load_lib gdb-python.exp
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile} "debug"] } {
+    return -1
+}
+
+# Skip all tests if Python scripting is not enabled.
+if { [skip_python_tests] } { continue }
+
+if ![runto_main] then {
+    fail "can't run to main"
+    return 0
+}
+
+set pyfile [gdb_remote_download host ${srcdir}/${subdir}/${testfile}.py]
+
+gdb_test "source ${pyfile}" "Python script imported" \
+         "import python scripts"
+
+gdb_breakpoint [gdb_get_line_number "Break here."]
+gdb_continue_to_breakpoint "Break here."
+
+set curr_pc [get_valueof "/x" "\$pc" "*unknown*"]
+
+gdb_test_no_output "python current_pc = ${curr_pc}"
+
+# The current pc will be something like 0x1234 with no leading zeros.
+# However, in the disassembler output addresses are padded with zeros.
+# This substitution changes 0x1234 to 0x0*1234, which can then be used
+# as a regexp in the disassembler output matching.
+set curr_pc_pattern [string replace ${curr_pc} 0 1 "0x0*"]
+
+# Grab the name of the current architecture, this is used in the tests
+# patterns below.
+set curr_arch [get_python_valueof "gdb.selected_inferior().architecture().name()" "*unknown*"]
+
+# Helper proc that removes all registered disassemblers.
+proc py_remove_all_disassemblers {} {
+    gdb_test_no_output "python remove_all_python_disassemblers()"
+}
+
+# A list of test plans.  Each plan is a list of two elements, the
+# first element is the name of a class in py-disasm.py, this is a
+# disassembler class.  The second element is a pattern that should be
+# matched in the disassembler output.
+#
+# Each different disassembler tests some different feature of the
+# Python disassembler API.
+set addr_pattern "\r\n=> ${curr_pc_pattern} <\[^>\]+>:\\s+"
+set base_pattern "${addr_pattern}nop"
+set test_plans \
+    [list \
+	 [list "" "${base_pattern}\r\n.*"] \
+	 [list "GlobalNullDisassembler" "${base_pattern}\r\n.*"] \
+	 [list "GlobalPreInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalPostInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalReadDisassembler" "${base_pattern}\\s+## bytes =( $hex)+\r\n.*"] \
+	 [list "GlobalAddrDisassembler" "${base_pattern}\\s+## addr = ${curr_pc_pattern} <\[^>\]+>\r\n.*"] \
+	 [list "NonMemoryErrorEarlyDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: non-memory error instead of a result\r\nnop\r\n.*"] \
+	 [list "NonMemoryErrorLateDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: non-memory error after builtin disassembler\r\nnop\r\n.*"] \
+	 [list "MemoryErrorEarlyDisassembler" "${base_pattern}\\s+## AFTER ERROR\r\n.*"] \
+	 [list "MemoryErrorLateDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "RethrowMemoryErrorDisassembler" "${addr_pattern}Cannot access memory at address $hex"] \
+	 [list "FaultingMemorySourceDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "FailingMemorySourceDisassembler" "${addr_pattern}Python Exception <class 'gdb.GdbError'>: the memory source failed\r\n\r\nCannot access memory at address ${curr_pc_pattern}"]]
+
+# Now execute each test plan.
+foreach plan $test_plans {
+    set global_disassembler_name [lindex $plan 0]
+    set expected_pattern [lindex $plan 1]
+
+    with_test_prefix "global_disassembler=${global_disassembler_name}" {
+	# Remove all existing disassemblers.
+	py_remove_all_disassemblers
+
+	# If we have a disassembler to load, do it now.
+	if { $global_disassembler_name != "" } {
+	    gdb_test_no_output "python add_global_disassembler($global_disassembler_name)"
+	}
+
+	# Disassemble main, and check the disassembler output.
+	gdb_test "disassemble main" $expected_pattern
+    }
+}
+
+# Check that the architecture specific disassemblers can override the
+# global disassembler.
+#
+# First, register a global disassembler, and check it is in place.
+with_test_prefix "GLOBAL tagging disassembler" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"GLOBAL\"), None)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Now register an architecture specific disassembler, and check it
+# overrides the global disassembler.
+with_test_prefix "LOCAL tagging disassembler" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"LOCAL\"), \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = LOCAL\r\n.*"
+}
+
+# Now remove the architecture specific disassembler, and check that
+# the global disassembler kicks back in.
+with_test_prefix "GLOBAL tagging disassembler again" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(None, \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Check that a DisassembleInfo becomes invalid after the call into the
+# disassembler.
+with_test_prefix "DisassembleInfo becomes invalid" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python add_global_disassembler(GlobalCachingDisassembler)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## CACHED\r\n.*"
+    gdb_test "python GlobalCachingDisassembler.check()" "PASS"
+}
+
+# Test the memory source aspect of the builtin disassembler.
+with_test_prefix "memory source api" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(analyzing_disassembler)"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*"
+    gdb_test "python analyzing_disassembler.find_replacement_candidate()" \
+	"Replace from $hex to $hex with NOP"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*" \
+	"second disassembler pass"
+    gdb_test "python analyzing_disassembler.check()" \
+	"PASS"
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.py b/gdb/testsuite/gdb.python/py-disasm.py
new file mode 100644
index 00000000000..a05244dbb1b
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.py
@@ -0,0 +1,456 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import gdb
+import gdb.disassembler
+import struct
+import sys
+
+from gdb.disassembler import Disassembler, DisassemblerResult
+
+# A global, holds the program-counter address at which we should
+# perform the extra disassembly that this script provides.
+current_pc = None
+
+
+def remove_all_python_disassemblers():
+    for a in gdb.architecture_names():
+        gdb.disassembler.register_disassembler(None, a)
+    gdb.disassembler.register_disassembler(None, None)
+
+
+class TestDisassembler(Disassembler):
+    """A base class for disassemblers within this script to inherit from.
+    Implements the __call__ method and ensures we only do any
+    disassembly wrapping for the global CURRENT_PC."""
+
+    def __init__(self):
+        global current_pc
+
+        super(TestDisassembler, self).__init__("TestDisassembler")
+        if current_pc == None:
+            raise gdb.GdbError("no current_pc set")
+
+    def __call__(self, info):
+        global current_pc
+
+        if info.address != current_pc:
+            return None
+        return self.disassemble(info)
+
+    def disassemble(self, info):
+        raise NotImplementedError("override the disassemble method")
+
+
+class GlobalPreInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo before disassembly has occurred."""
+
+    def disassemble(self, info):
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalPostInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo after disassembly has occurred."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalReadDisassembler(TestDisassembler):
+    """Check the DisassembleInfo.read_memory method.  Calls the builtin
+    disassembler, then reads all of the bytes of this instruction, and
+    adds them as a comment to the disassembler output."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        len = result.length
+        str = ""
+        for o in range(len):
+            if str != "":
+                str += " "
+            v = bytes(info.read_memory(1, o))[0]
+            if sys.version_info[0] < 3:
+                v = struct.unpack("<B", v)
+            str += "0x%02x" % v
+        text = result.string + "\t## bytes = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalAddrDisassembler(TestDisassembler):
+    """Check the gdb.format_address method."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        arch = info.architecture
+        addr = info.address
+        program_space = info.progspace
+        str = gdb.format_address(addr, program_space, arch)
+        text = result.string + "\t## addr = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class NonMemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a non-memory error instead of performing any disassembly."""
+
+    def disassemble(self, info):
+        raise gdb.GdbError("non-memory error instead of a result")
+
+
+class NonMemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a non-memory error after calling the builtin disassembler."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        raise gdb.GdbError("non-memory error after builtin disassembler")
+
+
+class MemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a memory error, ignore the error and disassemble."""
+
+    def disassemble(self, info):
+        tag = "## FAIL"
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError:
+            tag = "## AFTER ERROR"
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t" + tag
+        return DisassemblerResult(result.length, text)
+
+
+class MemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a memory error after calling the builtin disassembler, but
+    before we return a result."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        # The following read will throw an error.
+        info.read_memory(1, -info.address + 2)
+        return DisassemblerResult(1, "BAD")
+
+
+class RethrowMemoryErrorDisassembler(TestDisassembler):
+    """Catch and rethrow a memory error."""
+
+    def disassemble(self, info):
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError as e:
+            raise gdb.MemoryError("cannot read code at address 0x2")
+        return DisassemblerResult(1, "BAD")
+
+
+class TaggingDisassembler(TestDisassembler):
+    """A simple disassembler that just tags the output."""
+
+    def __init__(self, tag):
+        super(TaggingDisassembler, self).__init__()
+        self._tag = tag
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## tag = %s" % self._tag
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalCachingDisassembler(TestDisassembler):
+    """A disassembler that caches the DisassembleInfo that is passed in. Once
+    the call into the disassembler is complete then the DisassembleInfo
+    becomes invalid, and any calls into it should trigger an
+    exception."""
+
+    # This is where we cache the DisassembleInfo object.
+    cached_insn_disas = None
+
+    def disassemble(self, info):
+        """Disassemble the instruction, add a CACHED comment to the output,
+        and cache the DisassembleInfo so that it is not garbage collected."""
+        GlobalCachingDisassembler.cached_insn_disas = info
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## CACHED"
+        return DisassemblerResult(result.length, text)
+
+    @staticmethod
+    def check():
+        """Check that all of the methods on the cached DisassembleInfo trigger an
+        exception."""
+        info = GlobalCachingDisassembler.cached_insn_disas
+        assert isinstance(info, gdb.disassembler.DisassembleInfo)
+        assert not info.is_valid()
+        try:
+            val = info.address
+            raise gdb.GdbError("DisassembleInfo.address is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError("DisassembleInfo.address raised an unexpected exception")
+
+        try:
+            val = info.architecture
+            raise gdb.GdbError("DisassembleInfo.architecture is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError(
+                "DisassembleInfo.architecture raised an unexpected exception"
+            )
+
+        try:
+            val = info.read_memory(1, 0)
+            raise gdb.GdbError("DisassembleInfo.read is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError("DisassembleInfo.read raised an unexpected exception")
+
+        print("PASS")
+
+
+class GlobalNullDisassembler(TestDisassembler):
+    """A disassembler that does not change the output at all."""
+
+    def disassemble(self, info):
+        pass
+
+
+class FaultingMemorySourceDisassembler(TestDisassembler):
+    """Throw a memory error from the memory source read_memory method."""
+
+    def disassemble(self, info):
+        return gdb.disassembler.builtin_disassemble(info, self)
+
+    def read_memory(self, len, offset):
+        # Throw a memory error with a specific address.  We don't
+        # expect this address to show up in the output though.
+        raise gdb.MemoryError(0x1234)
+
+
+class FailingMemorySourceDisassembler(TestDisassembler):
+    """Throw a non-memory error from the memory source read_memory method."""
+
+    def disassemble(self, info):
+        return gdb.disassembler.builtin_disassemble(info, self)
+
+    def read_memory(self, len, offset):
+        raise gdb.GdbError("the memory source failed")
+
+
+class AnalyzingDisassembler(Disassembler):
+    def __init__(self, name):
+        """Constructor."""
+        super(AnalyzingDisassembler, self).__init__(name)
+
+        # Details about the instructions found during the first disassembler
+        # pass.
+        self._pass_1_length = []
+        self._pass_1_insn = []
+        self._pass_1_address = []
+
+        # The start and end address for the instruction we will replace with
+        # one or more 'nop' instructions during pass two.
+        self._start = None
+        self._end = None
+
+        # The index in the _pass_1_* lists for where the nop instruction can
+        # be found, also, the buffer of bytes that make up a nop instruction.
+        self._nop_index = None
+        self._nop_bytes = None
+
+        # The DisassembleInfo object passed into __call__ as INFO.
+        self._info = None
+
+        # A flag that indicates if we are in the first or second pass of
+        # this disassembler test.
+        self._first_pass = True
+
+        # The disassembled instructions collected during the second pass.
+        self._pass_2_insn = []
+
+        # A copy of _pass_1_insn that has been modified to include the extra
+        # 'nop' instructions we plan to insert during the second pass.  This
+        # is then checked against _pass_2_insn after the second disassembler
+        # pass has completed.
+        self._check = []
+
+    def __call__(self, info):
+        """Called to perform the disassembly."""
+
+        # Record INFO, we'll need to refer to this in READ_MEMORY which is
+        # called back to by the builtin disassembler.
+        self._info = info
+        result = gdb.disassembler.builtin_disassemble(info, self)
+
+        # Record some informaiton about the first 'nop' instruction we find.
+        if self._nop_index is None and result.string == "nop":
+            self._nop_index = len(self._pass_1_length)
+            # The offset in the following read_memory call defaults to 0.
+            self._nop_bytes = info.read_memory(result.length)
+
+        # Record information about each instruction that is disassembled.
+        # This test is performed in two passes, and we need different
+        # information in each pass.
+        if self._first_pass:
+            self._pass_1_length.append(result.length)
+            self._pass_1_insn.append(result.string)
+            self._pass_1_address.append(info.address)
+        else:
+            self._pass_2_insn.append(result.string)
+
+        return result
+
+    def _read_replacement(self, length, offset):
+        """Return a slice of the buffer representing the replacement nop
+        instructions."""
+
+        assert self._nop_bytes is not None
+        rb = self._nop_bytes
+
+        # If this request is outside of a nop instruction then we don't know
+        # what to do, so just raise a memory error.
+        if offset >= len(rb) or (offset + length) > len(rb):
+            raise gdb.MemoryError("invalid length and offset combination")
+
+        # Return only the slice of the nop instruction as requested.
+        s = offset
+        e = offset + length
+        return rb[s:e]
+
+    def read_memory(self, len, offset):
+        """Callback used from the builtin disassembler to read the contents of
+        memory."""
+
+        info = self._info
+        assert info is not None
+
+        # If this request is within the region we are replacing with 'nop'
+        # instructions, then call the helper function to perform that
+        # replacement.
+        if self._start is not None:
+            assert self._end is not None
+            if info.address >= self._start and info.address < self._end:
+                return self._read_replacement(len, offset)
+
+        # Otherwise, we just forward this request to the default read memory
+        # implementation.
+        return info.read_memory(len, offset)
+
+    def find_replacement_candidate(self):
+        """Call this after the first disassembly pass.  This identifies a suitable
+        instruction to replace with 'nop' instruction(s)."""
+
+        if self._nop_index is None:
+            raise gdb.GdbError("no nop was found")
+
+        nop_idx = self._nop_index
+        nop_length = self._pass_1_length[nop_idx]
+
+        # First we look for an instruction that is larger than a nop
+        # instruction, but whose length is an exact multiple of the nop
+        # instruction's length.
+        replace_idx = None
+        for idx in range(len(self._pass_1_length)):
+            if (
+                idx > 0
+                and idx != nop_idx
+                and self._pass_1_insn[idx] != "nop"
+                and self._pass_1_length[idx] > self._pass_1_length[nop_idx]
+                and self._pass_1_length[idx] % self._pass_1_length[nop_idx] == 0
+            ):
+                replace_idx = idx
+                break
+
+        # If we still don't have a replacement candidate, then search again,
+        # this time looking for an instruciton that is the same length as a
+        # nop instruction.
+        if replace_idx is None:
+            for idx in range(len(self._pass_1_length)):
+                if (
+                    idx > 0
+                    and idx != nop_idx
+                    and self._pass_1_insn[idx] != "nop"
+                    and self._pass_1_length[idx] == self._pass_1_length[nop_idx]
+                ):
+                    replace_idx = idx
+                    break
+
+        # Weird, the nop instruction must be larger than every other
+        # instruction, or all instructions are 'nop'?
+        if replace_idx is None:
+            raise gdb.GdbError("can't find an instruction to replace")
+
+        # Record the instruction range that will be replaced with 'nop'
+        # instructions, and mark that we are now on the second pass.
+        self._start = self._pass_1_address[replace_idx]
+        self._end = self._pass_1_address[replace_idx] + self._pass_1_length[replace_idx]
+        self._first_pass = False
+        print("Replace from 0x%x to 0x%x with NOP" % (self._start, self._end))
+
+        # Finally, build the expected result.  Create the _check list, which
+        # is a copy of _pass_1_insn, but replace the instruction we
+        # identified above with a series of 'nop' instructions.
+        self._check = list(self._pass_1_insn)
+        nop_count = int(self._pass_1_length[replace_idx] / self._pass_1_length[nop_idx])
+        nops = ["nop"] * nop_count
+        self._check[replace_idx : (replace_idx + 1)] = nops
+
+    def check(self):
+        """Call this after the second disassembler pass to validate the output."""
+        if self._check != self._pass_2_insn:
+            print("APB, Check : %s" % self._check)
+            print("APB, Result: %s" % self._pass_2_insn)
+            raise gdb.GdbError("mismatch")
+        print("PASS")
+
+
+# Create a global instance of the AnalyzingDisassembler.  This isn't
+# registered as a disassembler yet though, that is done from the
+# py-diasm.exp later.
+analyzing_disassembler = AnalyzingDisassembler("AnalyzingDisassembler")
+
+
+def add_global_disassembler(dis_class):
+    """Create an instance of DIS_CLASS and register it as a global disassembler."""
+    dis = dis_class()
+    gdb.disassembler.register_disassembler(dis, None)
+
+
+# Start with all disassemblers removed.
+remove_all_python_disassemblers()
+
+print("Python script imported")


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv3 0/6] Add Python API for the disassembler
  2022-03-23 22:41 ` [PATCHv2 0/3] Add Python API for the disassembler Andrew Burgess
                     ` (2 preceding siblings ...)
  2022-03-23 22:41   ` [PATCHv2 3/3] gdb/python: implement the print_insn extension language hook Andrew Burgess
@ 2022-04-04 22:19   ` Andrew Burgess
  2022-04-04 22:19     ` [PATCHv3 1/6] gdb: move gdb_disassembly_flag into a new disasm-flags.h file Andrew Burgess
                       ` (6 more replies)
  3 siblings, 7 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-04-04 22:19 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

Changes in v3:

  - Rebased to current master, and retested,

  - Patch #1 is new in this series,

  - Patch #2 is changed slightly from v2, I've reworked the
    disassembler classes in a slightly different way now, in order to
    prepare for patches #5 and #6.

  - Patch #3 is unchanged from v2,

  - Patch #4 is unchanged from v2,

  - Patch #5 is new in v3.  I've included it here as the changes in #2
    only make sense knowing that patch #5 is coming,

  - Patch #6 is a small cleanup only possible after #2 and #5 have landed.

Changes in v2:

  - The first 3 patches from the v1 series were merged a while back,
    these were all refactoring, or auxiliary features,

  - There's a new #1 patch in the v2 series that does some new
    refactoring of GDB's disassembler classes, this was required in
    order to simplify the #3 patch,

  - Patch #2 in the v2 series is largely unchanged from patch #4 in
    the v1 series,

  - The syntax highlighting work that was in the v1 series was spun
    out into its own patch, and has been merged separately,

  - The format_address helper function that appeared in the v1 series,
    and that Simon suggested I make more general, was spun out into
    its own patch, and merged separately,

  - Finally, patch #3 in the v2 series is pretty much a complete
    rewrite from the v1 series in order to follow the approach
    suggested by Simon.  Results are now returned directly, either via
    'return' or by raising an exception, in contrast to the original
    approach which involved "setting" the result into an existing
    state object.

---

Andrew Burgess (6):
  gdb: move gdb_disassembly_flag into a new disasm-flags.h file
  gdb: add new base class to gdb_disassembler
  gdb: add extension language print_insn hook
  gdb/python: implement the print_insn extension language hook
  gdb: refactor the non-printing disassemblers
  gdb: unify two dis_asm_read_memory functions in disasm.c

 gdb/Makefile.in                        |   2 +
 gdb/NEWS                               |  34 +
 gdb/arc-linux-tdep.c                   |  16 +-
 gdb/arc-tdep.c                         |  29 +-
 gdb/arc-tdep.h                         |   5 -
 gdb/arm-tdep.c                         |   4 +-
 gdb/data-directory/Makefile.in         |   1 +
 gdb/disasm-flags.h                     |  40 +
 gdb/disasm-selftests.c                 |  70 +-
 gdb/disasm.c                           | 172 ++---
 gdb/disasm.h                           | 221 ++++--
 gdb/doc/python.texi                    | 248 +++++++
 gdb/extension-priv.h                   |  15 +
 gdb/extension.c                        |  20 +
 gdb/extension.h                        |  17 +
 gdb/guile/guile.c                      |   6 +-
 gdb/mep-tdep.c                         |   1 -
 gdb/mips-tdep.c                        |   4 +-
 gdb/python/lib/gdb/disassembler.py     | 109 +++
 gdb/python/py-disasm.c                 | 970 +++++++++++++++++++++++++
 gdb/python/py-registers.c              |   1 -
 gdb/python/python-internal.h           |  16 +
 gdb/python/python.c                    |   3 +
 gdb/s12z-tdep.c                        |  27 +-
 gdb/target.h                           |   2 +-
 gdb/testsuite/gdb.python/py-disasm.c   |  25 +
 gdb/testsuite/gdb.python/py-disasm.exp | 150 ++++
 gdb/testsuite/gdb.python/py-disasm.py  | 456 ++++++++++++
 28 files changed, 2451 insertions(+), 213 deletions(-)
 create mode 100644 gdb/disasm-flags.h
 create mode 100644 gdb/python/lib/gdb/disassembler.py
 create mode 100644 gdb/python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv3 1/6] gdb: move gdb_disassembly_flag into a new disasm-flags.h file
  2022-04-04 22:19   ` [PATCHv3 0/6] Add Python API for the disassembler Andrew Burgess
@ 2022-04-04 22:19     ` Andrew Burgess
  2022-04-05 14:32       ` Tom Tromey
  2022-04-04 22:19     ` [PATCHv3 2/6] gdb: add new base class to gdb_disassembler Andrew Burgess
                       ` (5 subsequent siblings)
  6 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2022-04-04 22:19 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

While working on the disassembler I was getting frustrated.  Every
time I touched disasm.h it seemed like every file in GDB would need to
be rebuilt.  Surely the disassembler can't be required by that many
parts of GDB, right?

Turns out that disasm.h was included in target.h, so pretty much every
file was being rebuilt!

The only thing from disasm.h that target.h needed is the
gdb_disassembly_flag enum, as this is part of the target_ops api.

In this commit I move gdb_disassembly_flag into its own file.  This is
then included in target.h and disasm.h, after which, the number of
files that need to include disasm.h is much reduced.

Now, after changing disasm.h, GDB rebuilds much quicker.

There should be no user visible changes after this commit.
---
 gdb/Makefile.in           |  1 +
 gdb/arc-linux-tdep.c      |  1 +
 gdb/disasm-flags.h        | 40 +++++++++++++++++++++++++++++++++++++++
 gdb/disasm.h              | 14 +-------------
 gdb/mep-tdep.c            |  1 -
 gdb/python/py-registers.c |  1 -
 gdb/s12z-tdep.c           |  1 +
 gdb/target.h              |  2 +-
 8 files changed, 45 insertions(+), 16 deletions(-)
 create mode 100644 gdb/disasm-flags.h

diff --git a/gdb/Makefile.in b/gdb/Makefile.in
index a775b2f4d19..647f012ad4f 100644
--- a/gdb/Makefile.in
+++ b/gdb/Makefile.in
@@ -1289,6 +1289,7 @@ HFILES_NO_SRCDIR = \
 	defs.h \
 	dicos-tdep.h \
 	dictionary.h \
+	disasm-flags.h \
 	disasm.h \
 	dummy-frame.h \
 	dwarf2/cu.h \
diff --git a/gdb/arc-linux-tdep.c b/gdb/arc-linux-tdep.c
index e895b72ce71..1744b7544cd 100644
--- a/gdb/arc-linux-tdep.c
+++ b/gdb/arc-linux-tdep.c
@@ -24,6 +24,7 @@
 #include "opcode/arc.h"
 #include "osabi.h"
 #include "solib-svr4.h"
+#include "disasm.h"
 
 /* ARC header files.  */
 #include "opcodes/arc-dis.h"
diff --git a/gdb/disasm-flags.h b/gdb/disasm-flags.h
new file mode 100644
index 00000000000..025b6893941
--- /dev/null
+++ b/gdb/disasm-flags.h
@@ -0,0 +1,40 @@
+/* Disassemble flags for GDB.
+
+   Copyright (C) 2002-2022 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#ifndef DISASM_FLAGS_H
+#define DISASM_FLAGS_H
+
+#include "gdbsupport/enum-flags.h"
+
+/* Flags used to control how GDB's disassembler behaves.  */
+
+enum gdb_disassembly_flag
+  {
+    DISASSEMBLY_SOURCE_DEPRECATED = (0x1 << 0),
+    DISASSEMBLY_RAW_INSN = (0x1 << 1),
+    DISASSEMBLY_OMIT_FNAME = (0x1 << 2),
+    DISASSEMBLY_FILENAME = (0x1 << 3),
+    DISASSEMBLY_OMIT_PC = (0x1 << 4),
+    DISASSEMBLY_SOURCE = (0x1 << 5),
+    DISASSEMBLY_SPECULATIVE = (0x1 << 6),
+  };
+DEF_ENUM_FLAGS_TYPE (enum gdb_disassembly_flag, gdb_disassembly_flags);
+
+#endif /* DISASM_FLAGS_H */
+
diff --git a/gdb/disasm.h b/gdb/disasm.h
index b71cd097a16..7efab7db46c 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -20,19 +20,7 @@
 #define DISASM_H
 
 #include "dis-asm.h"
-#include "gdbsupport/enum-flags.h"
-
-enum gdb_disassembly_flag
-  {
-    DISASSEMBLY_SOURCE_DEPRECATED = (0x1 << 0),
-    DISASSEMBLY_RAW_INSN = (0x1 << 1),
-    DISASSEMBLY_OMIT_FNAME = (0x1 << 2),
-    DISASSEMBLY_FILENAME = (0x1 << 3),
-    DISASSEMBLY_OMIT_PC = (0x1 << 4),
-    DISASSEMBLY_SOURCE = (0x1 << 5),
-    DISASSEMBLY_SPECULATIVE = (0x1 << 6),
-  };
-DEF_ENUM_FLAGS_TYPE (enum gdb_disassembly_flag, gdb_disassembly_flags);
+#include "disasm-flags.h"
 
 struct gdbarch;
 struct ui_out;
diff --git a/gdb/mep-tdep.c b/gdb/mep-tdep.c
index 696d9c63bce..d16c68e6fb6 100644
--- a/gdb/mep-tdep.c
+++ b/gdb/mep-tdep.c
@@ -37,7 +37,6 @@
 #include "regcache.h"
 #include "remote.h"
 #include "sim-regno.h"
-#include "disasm.h"
 #include "trad-frame.h"
 #include "reggroups.h"
 #include "elf-bfd.h"
diff --git a/gdb/python/py-registers.c b/gdb/python/py-registers.c
index eab88a30b3b..975eb2ca72d 100644
--- a/gdb/python/py-registers.c
+++ b/gdb/python/py-registers.c
@@ -20,7 +20,6 @@
 #include "defs.h"
 #include "gdbarch.h"
 #include "arch-utils.h"
-#include "disasm.h"
 #include "reggroups.h"
 #include "python-internal.h"
 #include "user-regs.h"
diff --git a/gdb/s12z-tdep.c b/gdb/s12z-tdep.c
index bf6a71c7f7f..5394c1bbf5e 100644
--- a/gdb/s12z-tdep.c
+++ b/gdb/s12z-tdep.c
@@ -32,6 +32,7 @@
 #include "remote.h"
 #include "opcodes/s12z-opc.h"
 #include "gdbarch.h"
+#include "disasm.h"
 
 /* Two of the registers included in S12Z_N_REGISTERS are
    the CCH and CCL "registers" which are just views into
diff --git a/gdb/target.h b/gdb/target.h
index 4cc79df05b4..c9791e850ac 100644
--- a/gdb/target.h
+++ b/gdb/target.h
@@ -79,7 +79,7 @@ struct inferior;
 #include "btrace.h"
 #include "record.h"
 #include "command.h"
-#include "disasm.h"
+#include "disasm-flags.h"
 #include "tracepoint.h"
 
 #include "gdbsupport/break-common.h" /* For enum target_hw_bp_type.  */
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv3 2/6] gdb: add new base class to gdb_disassembler
  2022-04-04 22:19   ` [PATCHv3 0/6] Add Python API for the disassembler Andrew Burgess
  2022-04-04 22:19     ` [PATCHv3 1/6] gdb: move gdb_disassembly_flag into a new disasm-flags.h file Andrew Burgess
@ 2022-04-04 22:19     ` Andrew Burgess
  2022-04-04 22:19     ` [PATCHv3 3/6] gdb: add extension language print_insn hook Andrew Burgess
                       ` (4 subsequent siblings)
  6 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-04-04 22:19 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

The motivation for this change is an upcoming Python disassembler API
that I would like to add.  As part of that change I need to create a
new disassembler like class that contains a disassemble_info and a
gdbarch.  The management of these two objects is identical to how we
manage these objects within gdb_disassembler, so it might be tempting
for my new class to inherit from gdb_disassembler.

The problem however, is that gdb_disassembler has a tight connection
between its constructor, and its print_insn method.  In the
constructor the ui_file* that is passed in is replaced with a member
variable string_file*, and then in print_insn, the contents of the
member variable string_file are printed to the original ui_file*.

What this means is that the gdb_disassembler class has a tight
coupling between its constructor and print_insn; the class just isn't
intended to be used in a situation where print_insn is not going to be
called, which is how my (upcoming) sub-class would need to operate.

My solution then, is to separate out the management of the
disassemble_info and gdbarch into a new gdb_disassemble_info class,
and make this class a parent of gdb_disassembler.

In arm-tdep.c and mips-tdep.c, where we used to cast the
disassemble_info->application_data to a gdb_disassembler, we can now
cast to a gdb_disassemble_info as we only need to access the gdbarch
information.

Now, my new Python disassembler sub-class will still want to print
things to an output stream, and so we will want access to the
dis_asm_fprintf functionality for printing.

However, rather than move this printing code into the
gdb_disassemble_info base class, I have added yet another level of
hierarchy, a gdb_printing_disassembler, thus the class structure is
now:

  struct gdb_disassemble_info {};
  struct gdb_printing_disassembler : public gdb_disassemble_info {};
  struct gdb_disassembler : public gdb_printing_disassembler {};

In a later commit my new Python disassembler will inherit from
gdb_printing_disassembler.

The reason for adding the additional layer to the class hierarchy is
that in yet another commit I intend to rewrite the function
gdb_buffered_insn_length, and to do this I will be creating yet more
disassembler like classes, however, these will not print anything,
thus I will add a gdb_non_printing_disassembler class that also
inherits from gdb_disassemble_info.  Knowing that that change is
coming, I've gone with the above class hierarchy now.

There should be no user visible changes after this commit.
---
 gdb/arm-tdep.c  |   4 +-
 gdb/disasm.c    |  51 +++++++++++-------
 gdb/disasm.h    | 140 ++++++++++++++++++++++++++++++++++++++----------
 gdb/mips-tdep.c |   4 +-
 4 files changed, 147 insertions(+), 52 deletions(-)

diff --git a/gdb/arm-tdep.c b/gdb/arm-tdep.c
index 4d0f3492410..2c647aa5598 100644
--- a/gdb/arm-tdep.c
+++ b/gdb/arm-tdep.c
@@ -7759,8 +7759,8 @@ arm_displaced_step_fixup (struct gdbarch *gdbarch,
 static int
 gdb_print_insn_arm (bfd_vma memaddr, disassemble_info *info)
 {
-  gdb_disassembler *di
-    = static_cast<gdb_disassembler *>(info->application_data);
+  gdb_disassemble_info *di
+    = static_cast<gdb_disassemble_info *> (info->application_data);
   struct gdbarch *gdbarch = di->arch ();
 
   if (arm_pc_is_thumb (gdbarch, memaddr))
diff --git a/gdb/disasm.c b/gdb/disasm.c
index f2df5ef7bc5..3f55e12665b 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -166,7 +166,8 @@ gdb_disassembler::dis_asm_print_address (bfd_vma addr,
 /* Format disassembler output to STREAM.  */
 
 int
-gdb_disassembler::dis_asm_fprintf (void *stream, const char *format, ...)
+gdb_printing_disassembler::fprintf_func (void *stream,
+					 const char *format, ...)
 {
   va_list args;
 
@@ -180,9 +181,9 @@ gdb_disassembler::dis_asm_fprintf (void *stream, const char *format, ...)
 /* See disasm.h.  */
 
 int
-gdb_disassembler::dis_asm_styled_fprintf (void *stream,
-					  enum disassembler_style style,
-					  const char *format, ...)
+gdb_printing_disassembler::fprintf_styled_func (void *stream,
+						enum disassembler_style style,
+						const char *format, ...)
 {
   va_list args;
 
@@ -797,26 +798,34 @@ get_all_disassembler_options (struct gdbarch *gdbarch)
 
 gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
 				    struct ui_file *file,
-				    di_read_memory_ftype read_memory_func)
-  : m_gdbarch (gdbarch),
+				    read_memory_ftype func)
+  : gdb_printing_disassembler (gdbarch, &m_buffer, func,
+			       dis_asm_memory_error, dis_asm_print_address),
     m_buffer (!use_ext_lang_colorization_p && disassembler_styling
 	      && file->can_emit_style_escape ()),
     m_dest (file)
+{ /* Nothing.  */ }
+
+/* See disasm.h.  */
+
+gdb_disassemble_info::gdb_disassemble_info
+  (struct gdbarch *gdbarch, struct ui_file *stream,
+   read_memory_ftype read_memory_func, memory_error_ftype memory_error_func,
+   print_address_ftype print_address_func, fprintf_ftype fprintf_func,
+   fprintf_styled_ftype fprintf_styled_func)
+    : m_gdbarch (gdbarch)
 {
-  init_disassemble_info (&m_di, &m_buffer, dis_asm_fprintf,
-			 dis_asm_styled_fprintf);
+  gdb_assert (fprintf_func != nullptr);
+  gdb_assert (fprintf_styled_func != nullptr);
+  init_disassemble_info (&m_di, stream, fprintf_func,
+			 fprintf_styled_func);
   m_di.flavour = bfd_target_unknown_flavour;
-  m_di.memory_error_func = dis_asm_memory_error;
-  m_di.print_address_func = dis_asm_print_address;
-  /* NOTE: cagney/2003-04-28: The original code, from the old Insight
-     disassembler had a local optimization here.  By default it would
-     access the executable file, instead of the target memory (there
-     was a growing list of exceptions though).  Unfortunately, the
-     heuristic was flawed.  Commands like "disassemble &variable"
-     didn't work as they relied on the access going to the target.
-     Further, it has been superseeded by trust-read-only-sections
-     (although that should be superseeded by target_trust..._p()).  */
-  m_di.read_memory_func = read_memory_func;
+  if (memory_error_func != nullptr)
+    m_di.memory_error_func = memory_error_func;
+  if (print_address_func != nullptr)
+    m_di.print_address_func = print_address_func;
+  if (read_memory_func != nullptr)
+    m_di.read_memory_func = read_memory_func;
   m_di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
   m_di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
   m_di.endian = gdbarch_byte_order (gdbarch);
@@ -828,7 +837,9 @@ gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
   disassemble_init_for_target (&m_di);
 }
 
-gdb_disassembler::~gdb_disassembler ()
+/* See disasm.h.  */
+
+gdb_disassemble_info::~gdb_disassemble_info ()
 {
   disassemble_free_target (&m_di);
 }
diff --git a/gdb/disasm.h b/gdb/disasm.h
index 7efab7db46c..b3e40e2981e 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -26,43 +26,137 @@ struct gdbarch;
 struct ui_out;
 struct ui_file;
 
-class gdb_disassembler
-{
-  using di_read_memory_ftype = decltype (disassemble_info::read_memory_func);
-
-public:
-  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
-    : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
-  {}
+/* A wrapper around a disassemble_info and a gdbarch.  This is the core
+   set of data that all disassembler sub-classes will need.  This class
+   doesn't actually implement the disassembling process, that is something
+   that sub-classes will do, with each sub-class doing things slightly
+   differently.
 
-  ~gdb_disassembler ();
+   The constructor of this class is protected, you should not create
+   instances of this class directly, instead create an instance of an
+   appropriate sub-class.  */
 
-  DISABLE_COPY_AND_ASSIGN (gdb_disassembler);
-
-  int print_insn (CORE_ADDR memaddr, int *branch_delay_insns = NULL);
+struct gdb_disassemble_info
+{
+  DISABLE_COPY_AND_ASSIGN (gdb_disassemble_info);
 
-  /* Return the gdbarch of gdb_disassembler.  */
+  /* Return the gdbarch we are disassembing for.  */
   struct gdbarch *arch ()
   { return m_gdbarch; }
 
+  /* Return a pointer to the disassemble_info, this will be needed for
+     passing into the libopcodes disassembler.  */
+  struct disassemble_info *disasm_info ()
+  { return &m_di; }
+
 protected:
-  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file,
-		    di_read_memory_ftype func);
 
+  /* Types for the function callbacks within m_di.  */
+  using read_memory_ftype = decltype (disassemble_info::read_memory_func);
+  using memory_error_ftype = decltype (disassemble_info::memory_error_func);
+  using print_address_ftype = decltype (disassemble_info::print_address_func);
+  using fprintf_ftype = decltype (disassemble_info::fprintf_func);
+  using fprintf_styled_ftype = decltype (disassemble_info::fprintf_styled_func);
+
+  /* Constructor, many fields in m_di are initialized from GDBARCH.  STREAM
+     is where the output of the disassembler will be written too, the
+     remaining arguments are function callbacks that are written into
+     m_di.  Of these function callbacks FPRINTF_FUNC and
+     FPRINTF_STYLED_FUNC must not be nullptr.  If READ_MEMORY_FUNC,
+     MEMORY_ERROR_FUNC, or PRINT_ADDRESS_FUNC are nullptr, then that field
+     within m_di is left with its default value (see the libopcodes
+     function init_disassemble_info for the defaults).  */
+  gdb_disassemble_info (struct gdbarch *gdbarch,
+			struct ui_file *stream,
+			read_memory_ftype read_memory_func,
+			memory_error_ftype memory_error_func,
+			print_address_ftype print_address_func,
+			fprintf_ftype fprintf_func,
+			fprintf_styled_ftype fprintf_styled_func);
+
+  /* Destructor.  */
+  virtual ~gdb_disassemble_info ();
+
+  /* The stream that disassembler output is being written too.  */
   struct ui_file *stream ()
   { return (struct ui_file *) m_di.stream; }
 
-private:
-  struct gdbarch *m_gdbarch;
-
   /* Stores data required for disassembling instructions in
      opcodes.  */
   struct disassemble_info m_di;
 
+private:
+  /* The architecture we are disassembling for.  */
+  struct gdbarch *m_gdbarch;
+
   /* If we own the string in `m_di.disassembler_options', we do so
      using this field.  */
   std::string m_disassembler_options_holder;
+};
+
+/* A wrapper around gdb_disassemble_info.  This class adds default
+   print functions that are supplied to the disassemble_info within the
+   parent class.  These default print functions write to the stream, which
+   is also contained in the parent class.
+
+   As with the parent class, the constructor for this class is protected,
+   you should not create instances of this class, but create an
+   appropriate sub-class instead.  */
 
+struct gdb_printing_disassembler : public gdb_disassemble_info
+{
+  DISABLE_COPY_AND_ASSIGN (gdb_printing_disassembler);
+
+protected:
+
+  /* Constructor.  All the arguments are just passed to the parent class.
+     We also add the two print functions to the arguments passed to the
+     parent.  See gdb_disassemble_info for a description of how the
+     arguments are handled.  */
+  gdb_printing_disassembler (struct gdbarch *gdbarch,
+			     struct ui_file *stream,
+			     read_memory_ftype read_memory_func,
+			     memory_error_ftype memory_error_func,
+			     print_address_ftype print_address_func)
+    : gdb_disassemble_info (gdbarch, stream, read_memory_func,
+			    memory_error_func, print_address_func,
+			    fprintf_func, fprintf_styled_func)
+  { /* Nothing.  */ }
+
+  /* Callback used as the disassemble_info's fprintf_func callback, this
+     writes to STREAM, which will be m_di.stream.  */
+  static int fprintf_func (void *stream, const char *format, ...)
+    ATTRIBUTE_PRINTF(2,3);
+
+  /* Callback used as the disassemble_info's fprintf_styled_func callback,
+     this writes to STREAM, which will be m_di.stream.  */
+  static int fprintf_styled_func (void *stream,
+				  enum disassembler_style style,
+				  const char *format, ...)
+    ATTRIBUTE_PRINTF(3,4);
+};
+
+/* A dissassembler class that provides 'print_insn', a method for
+   disassembling a single instruction to the output stream.  */
+
+struct gdb_disassembler : public gdb_printing_disassembler
+{
+  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
+    : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
+  { /* Nothing.  */ }
+
+  DISABLE_COPY_AND_ASSIGN (gdb_disassembler);
+
+  /* Disassemble a single instruction at MEMADDR to the ui_file* that was
+     passed to the constructor.  If a memory error occurs while
+     disassembling this instruction then an error will be thrown.  */
+  int print_insn (CORE_ADDR memaddr, int *branch_delay_insns = NULL);
+
+protected:
+  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file,
+		    read_memory_ftype func);
+
+private:
   /* This member variable is given a value by calling dis_asm_memory_error.
      If after calling into the libopcodes disassembler we get back a
      negative value (which indicates an error), then, if this variable has
@@ -95,16 +189,6 @@ class gdb_disassembler
      (currently just to addresses and symbols) as it goes.  */
   static bool use_ext_lang_colorization_p;
 
-  static int dis_asm_fprintf (void *stream, const char *format, ...)
-    ATTRIBUTE_PRINTF(2,3);
-
-  /* Print formatted message to STREAM, the content can be styled based on
-     STYLE if desired.  */
-  static int dis_asm_styled_fprintf (void *stream,
-				     enum disassembler_style style,
-				     const char *format, ...)
-    ATTRIBUTE_PRINTF(3,4);
-
   static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
 				  unsigned int len,
 				  struct disassemble_info *info);
diff --git a/gdb/mips-tdep.c b/gdb/mips-tdep.c
index 93945891407..2d39a66da48 100644
--- a/gdb/mips-tdep.c
+++ b/gdb/mips-tdep.c
@@ -7004,8 +7004,8 @@ reinit_frame_cache_sfunc (const char *args, int from_tty,
 static int
 gdb_print_insn_mips (bfd_vma memaddr, struct disassemble_info *info)
 {
-  gdb_disassembler *di
-    = static_cast<gdb_disassembler *>(info->application_data);
+  gdb_disassemble_info *di
+    = static_cast<gdb_disassemble_info *> (info->application_data);
   struct gdbarch *gdbarch = di->arch ();
 
   /* FIXME: cagney/2003-06-26: Is this even necessary?  The
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv3 3/6] gdb: add extension language print_insn hook
  2022-04-04 22:19   ` [PATCHv3 0/6] Add Python API for the disassembler Andrew Burgess
  2022-04-04 22:19     ` [PATCHv3 1/6] gdb: move gdb_disassembly_flag into a new disasm-flags.h file Andrew Burgess
  2022-04-04 22:19     ` [PATCHv3 2/6] gdb: add new base class to gdb_disassembler Andrew Burgess
@ 2022-04-04 22:19     ` Andrew Burgess
  2022-04-04 22:19     ` [PATCHv3 4/6] gdb/python: implement the print_insn extension language hook Andrew Burgess
                       ` (3 subsequent siblings)
  6 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-04-04 22:19 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

This commit is setup for the next commit.

In the next commit I will add a Python API to intercept the print_insn
calls within GDB, each print_insn call is responsible for
disassembling, and printing one instruction.  After the next commit it
will be possible for a user to write Python code that either wraps
around the existing disassembler, or even, in extreme situations,
entirely replaces the existing disassembler.

This commit does not add any new Python API.

What this commit does is put the extension language framework in place
for a print_insn hook.  There's a new callback added to 'struct
extension_language_ops', which is then filled in with nullptr for Python
and Guile.

Finally, in the disassembler, the code is restructured so that the new
extension language function ext_lang_print_insn is called before we
delegate to gdbarch_print_insn.

After this, the next commit can focus entirely on providing a Python
implementation of the new print_insn callback.

There should be no user visible change after this commit.
---
 gdb/disasm.c         | 29 ++++++++++++++++++++++++++---
 gdb/extension-priv.h | 15 +++++++++++++++
 gdb/extension.c      | 20 ++++++++++++++++++++
 gdb/extension.h      | 17 +++++++++++++++++
 gdb/guile/guile.c    |  6 +++++-
 gdb/python/python.c  |  2 ++
 6 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/gdb/disasm.c b/gdb/disasm.c
index 3f55e12665b..16e3c39b702 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -844,6 +844,29 @@ gdb_disassemble_info::~gdb_disassemble_info ()
   disassemble_free_target (&m_di);
 }
 
+/* Wrapper around calling gdbarch_print_insn.  This function takes care of
+   first calling the extension language hooks for print_insn, and, if none
+   of the extension languages can print this instruction, calls
+   gdbarch_print_insn to do the work.
+
+   GDBARCH is the architecture to disassemble in, VMA is the address of the
+   instruction being disassembled, and INFO is the libopcodes disassembler
+   related information.  */
+
+static int
+gdb_print_insn_1 (struct gdbarch *gdbarch, CORE_ADDR vma,
+		  struct disassemble_info *info)
+{
+  /* Call into the extension languages to do the disassembly.  */
+  gdb::optional<int> length = ext_lang_print_insn (gdbarch, vma, info);
+  if (length.has_value ())
+    return *length;
+
+  /* No extension language wanted to do the disassembly, so do it
+     manually.  */
+  return gdbarch_print_insn (gdbarch, vma, info);
+}
+
 /* See disasm.h.  */
 
 bool gdb_disassembler::use_ext_lang_colorization_p = true;
@@ -857,7 +880,7 @@ gdb_disassembler::print_insn (CORE_ADDR memaddr,
   m_err_memaddr.reset ();
   m_buffer.clear ();
 
-  int length = gdbarch_print_insn (arch (), memaddr, &m_di);
+  int length = gdb_print_insn_1 (arch (), memaddr, &m_di);
 
   /* If we have successfully disassembled an instruction, styling is on, we
      think that the extension language might be able to perform styling for
@@ -892,7 +915,7 @@ gdb_disassembler::print_insn (CORE_ADDR memaddr,
 	  gdb_assert (!m_buffer.term_out ());
 	  m_buffer.~string_file ();
 	  new (&m_buffer) string_file (true);
-	  length = gdbarch_print_insn (arch (), memaddr, &m_di);
+	  length = gdb_print_insn_1 (arch (), memaddr, &m_di);
 	  gdb_assert (length > 0);
 	}
     }
@@ -1047,7 +1070,7 @@ gdb_buffered_insn_length (struct gdbarch *gdbarch,
   gdb_buffered_insn_length_init_dis (gdbarch, &di, insn, max_len, addr,
 				     &disassembler_options_holder);
 
-  int result = gdbarch_print_insn (gdbarch, addr, &di);
+  int result = gdb_print_insn_1 (gdbarch, addr, &di);
   disassemble_free_target (&di);
   return result;
 }
diff --git a/gdb/extension-priv.h b/gdb/extension-priv.h
index d9450b51231..7c74e721c57 100644
--- a/gdb/extension-priv.h
+++ b/gdb/extension-priv.h
@@ -263,6 +263,21 @@ struct extension_language_ops
      contents, or an empty optional.  */
   gdb::optional<std::string> (*colorize_disasm) (const std::string &content,
 						 gdbarch *gdbarch);
+
+  /* Print a single instruction from ADDRESS in architecture GDBARCH.  INFO
+     is the standard libopcodes disassembler_info structure.  Bytes for the
+     instruction being printed should be read using INFO->read_memory_func
+     as the actual instruction bytes might be in a buffer.
+
+     Use INFO->fprintf_func to print the results of the disassembly, and
+     return the length of the instruction.
+
+     If no instruction can be disassembled then return an empty value and
+     other extension languages will get a chance to perform the
+     disassembly.  */
+  gdb::optional<int> (*print_insn) (struct gdbarch *gdbarch,
+				    CORE_ADDR address,
+				    struct disassemble_info *info);
 };
 
 /* State necessary to restore a signal handler to its previous value.  */
diff --git a/gdb/extension.c b/gdb/extension.c
index 8f39b86e952..62f41c6445d 100644
--- a/gdb/extension.c
+++ b/gdb/extension.c
@@ -924,6 +924,26 @@ ext_lang_colorize_disasm (const std::string &content, gdbarch *gdbarch)
   return result;
 }
 
+/* See extension.h.  */
+
+gdb::optional<int>
+ext_lang_print_insn (struct gdbarch *gdbarch, CORE_ADDR address,
+		     struct disassemble_info *info)
+{
+  for (const struct extension_language_defn *extlang : extension_languages)
+    {
+      if (extlang->ops == nullptr
+	  || extlang->ops->print_insn == nullptr)
+	continue;
+      gdb::optional<int> length
+	(extlang->ops->print_insn (gdbarch, address, info));
+      if (length.has_value ())
+	return length;
+    }
+
+  return {};
+}
+
 /* Called via an observer before gdb prints its prompt.
    Iterate over the extension languages giving them a chance to
    change the prompt.  The first one to change the prompt wins,
diff --git a/gdb/extension.h b/gdb/extension.h
index 7eb89530c44..f7518f91b35 100644
--- a/gdb/extension.h
+++ b/gdb/extension.h
@@ -327,6 +327,23 @@ extern gdb::optional<std::string> ext_lang_colorize
 extern gdb::optional<std::string> ext_lang_colorize_disasm
   (const std::string &content, gdbarch *gdbarch);
 
+/* Try to disassemble a single instruction.  ADDRESS is the address that
+   the instructions apparent address, though bytes for the instruction
+   should be read by calling INFO->read_memory_func as we might be
+   disassembling out of a buffer.  GDBARCH is the architecture in which we
+   are performing the disassembly.
+
+   The disassembled instruction should be printed by calling
+   INFO->fprintf_func, and the length (in octets) of the disassembled
+   instruction should be returned.
+
+   If no instruction could be disassembled then an empty value is returned
+   and GDB will call gdbarch_print_insn to perform the disassembly
+   itself.  */
+
+extern gdb::optional<int> ext_lang_print_insn
+  (struct gdbarch *gdbarch, CORE_ADDR address, struct disassemble_info *info);
+
 #if GDB_SELF_TEST
 namespace selftests {
 extern void (*hook_set_active_ext_lang) ();
diff --git a/gdb/guile/guile.c b/gdb/guile/guile.c
index c7be48fb739..14b191ded62 100644
--- a/gdb/guile/guile.c
+++ b/gdb/guile/guile.c
@@ -130,8 +130,12 @@ static const struct extension_language_ops guile_extension_ops =
   gdbscm_breakpoint_has_cond,
   gdbscm_breakpoint_cond_says_stop,
 
-  NULL, /* gdbscm_check_quit_flag, */
   NULL, /* gdbscm_set_quit_flag, */
+  NULL, /* gdbscm_check_quit_flag, */
+  NULL, /* gdbscm_before_prompt, */
+  NULL, /* gdbscm_get_matching_xmethod_workers */
+  NULL, /* gdbscm_colorize */
+  NULL, /* gdbscm_print_insn */
 };
 #endif
 
diff --git a/gdb/python/python.c b/gdb/python/python.c
index 7a9c8c1b66e..faa69d6fdfe 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -166,6 +166,8 @@ static const struct extension_language_ops python_extension_ops =
   gdbpy_colorize,
 
   gdbpy_colorize_disasm,
+
+  NULL, /* gdbpy_print_insn, */
 };
 
 #endif /* HAVE_PYTHON */
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv3 4/6] gdb/python: implement the print_insn extension language hook
  2022-04-04 22:19   ` [PATCHv3 0/6] Add Python API for the disassembler Andrew Burgess
                       ` (2 preceding siblings ...)
  2022-04-04 22:19     ` [PATCHv3 3/6] gdb: add extension language print_insn hook Andrew Burgess
@ 2022-04-04 22:19     ` Andrew Burgess
  2022-04-05 12:04       ` Eli Zaretskii
  2022-04-04 22:19     ` [PATCHv3 5/6] gdb: refactor the non-printing disassemblers Andrew Burgess
                       ` (2 subsequent siblings)
  6 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2022-04-04 22:19 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

This commit extends the Python API to include disassembler support.

The motivation for this commit was to provide an API by which the user
could write Python scripts that would augment the output of the
disassembler.

To achieve this I have followed the model of the existing libopcodes
disassembler, that is, instructions are disassembled one by one.  This
does restrict the type of things that it is possible to do from a
Python script, i.e. all additional output has to fit on a single line,
but this was all I needed, and creating something more complex would,
I think, require greater changes to how GDB's internal disassembler
operates.

The disassembler API is contained in the new gdb.disassembler module,
which defines the following classes:

  DisassembleInfo

      Similar to libopcodes disassemble_info structure, has read-only
  properties: address, architecture, and progspace.  And has methods:
  read_memory, and is_valid.

      Each time GDB wants an instruction disassembled, an instance of
  this class is passed to a user written disassembler function, by
  reading the properties, and calling the methods (and other support
  methods in the gdb.disassembler module) the user can perform and
  return the disassembly.

  Disassembler

      This is a base-class which user written disassemblers should
  inherit from.  This base class just provides base implementations of
  __init__ and __call__ which the user written disassembler should
  override.

  DisassemblerResult

      This class can be used to hold the result of a call to the
  disassembler, it's really just a wrapper around a string (the text
  of the disassembled instruction) and a length (in bytes).  The user
  can return an instance of this class from Disassembler.__call__ to
  represent the newly disassembled instruction.

The gdb.disassembler module also provides the following functions:

  register_disassembler

      This function registers an instance of a Disassembler sub-class
  as a disassembler, either for one specific architecture, or, as a
  global disassembler for all architectures.

  builtin_disassemble

      This provides access to GDB's builtin disassembler.  A common
  use case that I see is augmenting the existing disassembler output.
  The user code can call this function to have GDB disassemble the
  instruction in the normal way.  The user gets back a
  DisassemblerResult object, which they can then read in order to
  augment the disassembler output in any way they wish.

      This function also provides a mechanism to intercept the
  disassemblers reads of memory, thus the user can adjust what GDB
  sees when it is disassembling.

The included documentation provides a more detailed description of the
API.
---
 gdb/Makefile.in                        |   1 +
 gdb/NEWS                               |  34 +
 gdb/data-directory/Makefile.in         |   1 +
 gdb/doc/python.texi                    | 248 +++++++
 gdb/python/lib/gdb/disassembler.py     | 109 +++
 gdb/python/py-disasm.c                 | 970 +++++++++++++++++++++++++
 gdb/python/python-internal.h           |  16 +
 gdb/python/python.c                    |   3 +-
 gdb/testsuite/gdb.python/py-disasm.c   |  25 +
 gdb/testsuite/gdb.python/py-disasm.exp | 150 ++++
 gdb/testsuite/gdb.python/py-disasm.py  | 456 ++++++++++++
 11 files changed, 2012 insertions(+), 1 deletion(-)
 create mode 100644 gdb/python/lib/gdb/disassembler.py
 create mode 100644 gdb/python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

diff --git a/gdb/Makefile.in b/gdb/Makefile.in
index 647f012ad4f..79d00a78c77 100644
--- a/gdb/Makefile.in
+++ b/gdb/Makefile.in
@@ -393,6 +393,7 @@ SUBDIR_PYTHON_SRCS = \
 	python/py-cmd.c \
 	python/py-connection.c \
 	python/py-continueevent.c \
+	python/py-disasm.c \
 	python/py-event.c \
 	python/py-evtregistry.c \
 	python/py-evts.c \
diff --git a/gdb/NEWS b/gdb/NEWS
index d76157128be..123b98ea304 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -18,6 +18,40 @@
      This is the same format that GDB uses when printing address, symbol,
      and offset information from the disassembler.
 
+  ** New Python API for wrapping GDB's disassembler:
+
+     - gdb.disassembler.register_disassembler(DISASSEMBLER, ARCH).
+       DISASSEMBLER is a sub-class of gdb.disassembler.Disassembler.
+       ARCH is either None or a string containing a bfd architecture
+       name.  DISASSEMBLER is registered as a disassembler for
+       architecture ARCH, or for all architectures if ARCH is None.
+       The previous disassembler registered for ARCH is returned, this
+       can be None if no previous disassembler was registered.
+
+     - gdb.disassembler.Disassembler is the class from which all
+       disassemblers should inherit.  Its constructor takes a string,
+       a name for the disassembler, which is currently only used is
+       some debug output.  Sub-classes should override the __call__
+       method to perform disassembly, invoking __call__ on this base
+       class will raise an exception.
+
+     - gdb.disassembler.DisassembleInfo is the class used to describe
+       a single disassembly request from GDB.  An instance of this
+       class is passed to the __call__ method of
+       gdb.disassembler.Disassembler and has the following read-only
+       attributes: 'address', and 'architecture', as well as the
+       following method: 'read_memory'.
+
+     - gdb.disassembler.builtin_disassemble(INFO, MEMORY_SOURCE),
+       calls GDB's builtin disassembler on INFO, which is a
+       gdb.disassembler.DisassembleInfo object.  MEMORY_SOURCE is
+       optional, its default value is None.  If MEMORY_SOURCE is not
+       None then it must be an object that has a 'read_memory' method.
+
+     - gdb.disassembler.DisassemblerResult is a class that can be used
+       to wrap the result of a call to a Disassembler.  It has
+       read-only attributes 'length' and 'string'.
+
 *** Changes in GDB 12
 
 * DBX mode is deprecated, and will be removed in GDB 13
diff --git a/gdb/data-directory/Makefile.in b/gdb/data-directory/Makefile.in
index b606fc654b5..cf5226f3961 100644
--- a/gdb/data-directory/Makefile.in
+++ b/gdb/data-directory/Makefile.in
@@ -69,6 +69,7 @@ PYTHON_DIR = python
 PYTHON_INSTALL_DIR = $(DESTDIR)$(GDB_DATADIR)/$(PYTHON_DIR)
 PYTHON_FILE_LIST = \
 	gdb/__init__.py \
+	gdb/disassembler.py \
 	gdb/FrameDecorator.py \
 	gdb/FrameIterator.py \
 	gdb/frames.py \
diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
index 7c414b01d70..b30173dcfbb 100644
--- a/gdb/doc/python.texi
+++ b/gdb/doc/python.texi
@@ -222,6 +222,7 @@
 * Registers In Python::         Python representation of registers.
 * Connections In Python::	Python representation of connections.
 * TUI Windows In Python::       Implementing new TUI windows.
+* Disassembly In Python::       Instruction Disassembly In Python
 @end menu
 
 @node Basic Python
@@ -598,6 +599,7 @@
 related prompts are prohibited from being changed.
 @end defun
 
+@anchor{gdb_architecture_names}
 @defun gdb.architecture_names ()
 Return a list containing all of the architecture names that the
 current build of @value{GDBN} supports.  Each architecture name is a
@@ -3278,6 +3280,7 @@
 particular frame (@pxref{Frames In Python}).
 @end defun
 
+@anchor{gdbpy_inferior_read_memory}
 @findex Inferior.read_memory
 @defun Inferior.read_memory (address, length)
 Read @var{length} addressable memory units from the inferior, starting at
@@ -6558,6 +6561,251 @@
 values can be 1 (left), 2 (middle), or 3 (right).
 @end defun
 
+@node Disassembly In Python
+@cindex python instruction disassembly
+@subsubsection Instruction Disassembly In Python
+
+@value{GDBN}'s builtin disassembler can be extended, or even replaced,
+using the Python API.  The disassembler related features are contained
+within the @code{gdb.disassembler} module:
+
+@deftp {class} gdb.disassembler.DisassembleInfo
+Disassembly is driven by instances of this class.  Each time
+@value{GDBN} needs to disassemble an instruction, an instance of this
+class is created and passed to a registered disassembler.  The
+disassembler is then responsible for disassembling an instruction and
+returning a result.  This class has the following properties and
+methods:
+
+@defivar DisassembleInfo address
+An integer containing the address at which @value{GDBN} wishes to
+disassemble a single instruction.
+@end defivar
+
+@defivar DisassembleInfo architecture
+The @code{gdb.Architecture} (@pxref{Architectures In Python}) for
+which @value{GDBN} is currently disassembling.
+@end defivar
+
+@defivar DisassembleInfo progspace
+The @code{gdb.Progspace} (@pxref{Progspaces In Python,,Program Spaces
+In Python}) for which @value{GDBN} is currently disassembling.
+@end defivar
+
+@defmethod DisassembleInfo read_memory (length, offset)
+This method allows the disassembler to read the bytes of the
+instruction to be disassembled.  The method reads @var{length} bytes,
+starting at @var{offset} from
+@code{DisassembleInfo.address}.
+
+It is important that the disassembler read the instruction bytes using
+this method, rather than reading inferior memory directly, as in some
+cases @value{GDBN} disassembles from an internal buffer rather than
+directly from inferior memory.
+
+Returns a buffer object, which behaves much like an array or a string,
+just as @code{Inferior.read_memory} does
+(@pxref{gdbpy_inferior_read_memory,,Inferior.read_memory}).
+@end defmethod
+
+@defmethod DisassembleInfo is_valid ()
+Returns @code{True} if the @code{DisassembleInfo} object is valid,
+@code{False} if not.  A @code{DisassembleInfo} object will become
+invalid once the disassembly call for which the @code{DisassembleInfo}
+was created, has returned.  Calling other @code{DisassembleInfo}
+methods, or accessing @code{DisassembleInfo} properties, will raise a
+@code{RuntimeError} exception if it is invalid.
+@end defmethod
+@end deftp
+
+@deftp {class} Disassembler
+This is a base class from which all user implemented disassemblers
+must inherit.
+
+@defmethod Disassembler __init__ (name)
+The constructor takes @var{name}, a string, which should be a short
+name for this disassembler.  Currently, this name is only used in some
+debug output.
+@end defmethod
+
+@defmethod Disassembler __call__ (info)
+The @code{__call__} method must be overridden by sub-classes to
+perform disassembly.  Calling @code{__call__} on this base class will
+raise a @code{NotImplementedError} exception.
+
+The @var{info} argument is an instance of @code{DisassembleInfo}, and
+describes the instruction that @value{GDBN} wants disassembling.
+
+If this function returns @code{None}, this indicates to @value{GDBN}
+that this sub-class doesn't wish to disassemble the requested
+instruction.  @value{GDBN} will then use its builtin disassembler to
+perform the disassembly.
+
+Or, this function can return an object that represents the
+disassembled instruction.  The object must have the following two
+attributes:
+
+@defvar length
+The length of the disassembled instruction in bytes, which must be
+greater than zero.
+@end defvar
+
+@defvar string
+A non-empty string representing the disassembled instruction.
+@end defvar
+
+The @code{DisassemblerResult} type is defined as a possible class to
+represent disassembled instructions, but it is not required to use
+this type, so long as the required attributes are present.
+
+The @code{__call__} method can raise a @code{gdb.MemoryError}
+exception (@pxref{Exception Handling}) to indicate to @value{GDBN}
+that there was a problem accessing the required memory, this will then
+be displayed by @value{GDBN} within the disassembler output.
+
+Any other exception type raised by the @code{__call__} method is an
+error, @value{GDBN} will display the error and then use its builtin
+disassembler to disassemble the instruction instead.
+@end defmethod
+@end deftp
+
+@deftp {class} DisassemblerResult
+This class can be used to hold the result of calling
+@w{@code{Disassembler.__call__}}.  It is not required to use this
+type, any type with the required attributes will do.
+
+The required properties, which this class provides are:
+
+@defvar length
+The length of the disassembled instruction in bytes, which must be
+greater than zero.
+@end defvar
+
+@defvar string
+A non-empty string representing the disassembled instruction.
+@end defvar
+
+This class also provides a constructor:
+
+@defun DisassemblerResult.__init__ (@var{length}, @var{string})
+Initialise an instance of this class, @var{length} is the length of
+the disassembled instruction in bytes, which must be greater than
+zero, and @var{string} is a non-empty string that represents the
+disassembled instruction.
+@end defun
+@end deftp
+
+The following functions are also contained in the
+@code{gdb.disassembler} module:
+
+@defun register_disassembler (disassembler, architecture)
+The @var{disassembler} must be a sub-class of
+@code{gdb.disassembler.Disassembler}.
+
+The optional @var{architecture} is either a string, or the value
+@code{None}.  If it is a string, then it should be the name of an
+architecture known to @value{GDBN}, as returned either from
+@code{gdb.Architecture.name}
+(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from
+@code{gdb.architecture_names}
+(@pxref{gdb_architecture_names,,gdb.architecture_names}).
+
+The @var{disassembler} will be installed for the architecture named by
+@var{architecture}, or if @var{architecture} is @code{None}, then
+@var{disassembler} will be installed as a global disassembler for use
+by all architectures.
+
+@cindex disassembler in Python, global vs.@: specific
+@cindex search order for disassembler in Python
+@cindex look up of disassembler in Python
+
+@value{GDBN} only records a single disassembler for each architecture,
+and a single global disassembler.  Calling
+@code{register_disassembler} for an architecture, or for the global
+disassembler, will replace any existing disassembler registered for
+that @var{architecture} value.  The previous disassembler is returned.
+
+When @value{GDBN} is looking for a disassembler to use, @value{GDBN}
+first looks for an architecture specific disassembler.  If none has
+been registered then @value{GDBN} looks for a global disassembler (one
+registered with @var{architecture} set to @code{None}).  Only one
+disassembler is called to perform disassembly, so, if there is both an
+architecture specific disassembler, and a global disassembler
+registered, it is the architecture specific disassembler that will be
+used.
+
+@value{GDBN} tracks the architecture specific, and global
+disassemblers separately, so it doesn't matter in which order
+disassemblers are created or registered; an architecture specific
+disassembler, if present, will always be used in preference to a
+global disassembler.
+@end defun
+
+@defun builtin_disassemble (info, memory_source)
+This function calls back into @value{GDBN}'s builtin disassembler to
+disassemble the instruction identified by @var{info}, an instance of
+@code{DisassembleInfo}.
+
+If the builtin disassembler is successful then an instance of
+@code{DisassemblerResult} is returned.
+
+If the builtin disassembler fails then a @code{gdb.MemoryError}
+exception will be raised.
+
+The optional @var{memory_source} argument has the default value of
+@code{None}, in which case the builtin disassembler will read the
+instruction from memory in the normal way.
+
+If @var{memory_source} is not @code{None}, then it should be an
+instance of a class that implements the following method:
+
+@defmethod memory_source read_memory (length, offset)
+This method will be called by the builtin disassembler to fetch bytes
+of the instruction being disassembled.  @var{length} is the number of
+bytes to fetch, and @var{offset} is the offset from the address of the
+instruction being disassembled, this address is obtained from
+@code{DisassembleInfo.address}.
+
+This function should return a Python object that supports the buffer
+protocol, i.e.@: a string, an array, or the object returned from
+@code{DisassembleInfo.read_memory}.
+
+The length of the returned buffer @emph{must} be @var{length}
+otherwise a @code{ValueError} exception will be raised.
+
+Alternatively, this function can raise a @code{gdb.MemoryError}
+exception to indicate that the read failed.  Raising any other
+exception type is an error.
+
+It is important to understand that, even when this function raises a
+@code{gdb.MemoryError}, it is the internal disassembler itself that
+reports the memory error to @value{GDBN}.  The reason for this is that
+the disassembler might probe memory to see if a byte is readable or
+not; if the byte can't be read then the disassembler may choose not to
+report an error, but instead to disassemble the bytes that it does
+have available.
+@end defmethod
+@end defun
+
+Here is an example that registers a global disassembler.  The new
+disassembler invokes the builtin disassembler, and then adds a
+comment, @code{## Comment}, to each line of disassembly output:
+
+@smallexample
+class ExampleDisassembler(gdb.disassembler.Disassembler):
+    def __init__(self):
+        super(ExampleDisassembler, self).__init__("ExampleDisassembler")
+
+    def __call__(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        if result.string is not None:
+            length = result.length
+            text = result.string + "\t## Comment"
+            return gdb.disassembler.DisassemblerResult(length, text)
+
+gdb.disassembler.register_disassembler(ExampleDisassembler())
+@end smallexample
+
 @node Python Auto-loading
 @subsection Python Auto-loading
 @cindex Python auto-loading
diff --git a/gdb/python/lib/gdb/disassembler.py b/gdb/python/lib/gdb/disassembler.py
new file mode 100644
index 00000000000..19ec0ecf82f
--- /dev/null
+++ b/gdb/python/lib/gdb/disassembler.py
@@ -0,0 +1,109 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+"""Disassembler related module."""
+
+import gdb
+import _gdb.disassembler
+
+from _gdb.disassembler import *
+
+# Module global dictionary of gdb.disassembler.Disassembler objects.
+# The keys of this dictionary are bfd architecture names, or the
+# special value None.
+#
+# When a request to disassemble comes in we first lookup the bfd
+# architecture name from the gdbarch, if that name exists in this
+# dictionary then we use that Disassembler object.
+#
+# If there's no architecture specific disassembler then we look for
+# the key None in this dictionary, and if that key exists, we use that
+# disassembler.
+#
+# If none of the above checks found a suitable disassembler, then no
+# disassembly is performed in Python.
+_disassemblers_dict = {}
+
+
+class Disassembler(object):
+    """A base class from which all user implemented disassemblers must
+    inherit."""
+
+    def __init__(self, name):
+        """Constructor.  Takes a name, which should be a string, which can be
+        used to identify this disassembler in diagnostic messages."""
+        self.name = name
+
+    def __call__(self, info):
+        """A default implementation of __call__.  All sub-classes must
+        override this method.  Calling this default implementation will throw
+        a NotImplementedError exception."""
+        raise NotImplementedError("Disassembler.__call__")
+
+
+def register_disassembler(disassembler, architecture=None):
+    """Register a disassembler.  DISASSEMBLER is a sub-class of
+    gdb.disassembler.Disassembler.  ARCHITECTURE is either None or a
+    string, the name of an architecture known to GDB.
+
+    DISASSEMBLER is registered as a disassmbler for ARCHITECTURE, or
+    all architectures when ARCHITECTURE is None.
+
+    Returns the previous disassembler registered with this
+    ARCHITECTURE value.
+    """
+
+    if not isinstance(disassembler, Disassembler) and disassembler is not None:
+        raise TypeError("disassembler should sub-class gdb.disassembler.Disassembler")
+
+    old = None
+    if architecture in _disassemblers_dict:
+        old = _disassemblers_dict[architecture]
+        del _disassemblers_dict[architecture]
+    if disassembler is not None:
+        _disassemblers_dict[architecture] = disassembler
+
+    # Call the private _set_enabled function within the
+    # _gdb.disassembler module.  This function sets a global flag
+    # within GDB's C++ code that enables or dissables the Python
+    # disassembler functionality, this improves performance of the
+    # disassembler by avoiding unneeded calls into Python when we know
+    # that no disassemblers are registered.
+    _gdb.disassembler._set_enabled(len(_disassemblers_dict) > 0)
+    return old
+
+
+def _print_insn(info):
+    """This function is called by GDB when it wants to disassemble an
+    instruction.  INFO describes the instruction to be
+    disassembled."""
+
+    def lookup_disassembler(arch):
+        try:
+            name = arch.name()
+            if name is None:
+                return None
+            if name in _disassemblers_dict:
+                return _disassemblers_dict[name]
+            if None in _disassemblers_dict:
+                return _disassemblers_dict[None]
+            return None
+        except:
+            return None
+
+    disassembler = lookup_disassembler(info.architecture)
+    if disassembler is None:
+        return None
+    return disassembler(info)
diff --git a/gdb/python/py-disasm.c b/gdb/python/py-disasm.c
new file mode 100644
index 00000000000..e8b33fecee4
--- /dev/null
+++ b/gdb/python/py-disasm.c
@@ -0,0 +1,970 @@
+/* Python interface to instruction disassembly.
+
+   Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "defs.h"
+#include "python-internal.h"
+#include "dis-asm.h"
+#include "arch-utils.h"
+#include "charset.h"
+#include "disasm.h"
+#include "progspace.h"
+
+/* Implement gdb.disassembler.DisassembleInfo type.  An object of this type
+   represents a single disassembler request from GDB.  */
+
+struct disasm_info_object {
+  PyObject_HEAD
+
+  /* The architecture in which we are disassembling.  */
+  struct gdbarch *gdbarch;
+
+  /* The program_space in which we are disassembling.  */
+  struct program_space *program_space;
+
+  /* Address of the instruction to disassemble.  */
+  bfd_vma address;
+
+  /* The disassemble_info passed from core GDB, this contains the
+     callbacks necessary to read the instruction from core GDB, and to
+     print the disassembled instruction.  */
+  disassemble_info *gdb_info;
+};
+
+extern PyTypeObject disasm_info_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_info_object");
+
+/* Implement gdb.disassembler.DisassemblerResult type, an object that holds
+   the result of calling the disassembler.  This is mostly the length of
+   the disassembled instruction (in bytes), and the string representing the
+   disassembled instruction.  */
+
+struct disasm_result_object {
+  PyObject_HEAD
+
+  /* The length of the disassembled instruction in bytes.  */
+  int length;
+
+  /* A buffer which, when allocated, holds the disassembled content of an
+     instruction.  */
+  string_file *content;
+};
+
+extern PyTypeObject disasm_result_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_result_object");
+
+/* When this is false we fast path out of gdbpy_print_insn, which should
+   keep the performance impact of the Python disassembler down.  This is
+   set to true from Python by calling gdb.disassembler._set_enabled() when
+   the user registers a disassembler.  */
+
+static bool python_print_insn_enabled = false;
+
+/* A sub-class of gdb_disassembler that holds a pointer to a Python
+   DisassembleInfo object.  A pointer to an instance of this class is
+   placed in the application_data field of the disassemble_info that is
+   used when we call gdbarch_print_insn.  */
+
+struct gdbpy_disassembler : public gdb_printing_disassembler
+{
+  /* Constructor.  */
+  gdbpy_disassembler (disasm_info_object *obj, PyObject *memory_source);
+
+  /* Get the DisassembleInfo object pointer.  */
+  disasm_info_object *
+  py_disasm_info () const
+  {
+    return m_disasm_info_object;
+  }
+
+  /* Callbacks used by disassemble_info.  */
+  static void memory_error_func (int status, bfd_vma memaddr,
+				 struct disassemble_info *info);
+  static void print_address_func (bfd_vma addr,
+				  struct disassemble_info *info);
+  static int read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+			       unsigned int len,
+			       struct disassemble_info *info);
+
+  /* Return a reference to an optional that contains the address at which a
+     memory error occurred.  The optional will only have a value if a
+     memory error actually occurred.  */
+  const gdb::optional<CORE_ADDR> &memory_error_address () const
+  { return m_memory_error_address; }
+
+  /* Return the content of the disassembler as a string.  The contents are
+     moved out of the disassembler, so after this call the disassembler
+     contents have been reset back to empty.  */
+  std::string release ()
+  {
+    return m_string_file.release ();
+  }
+
+private:
+
+  /* Where the disassembler result is written.  */
+  string_file m_string_file;
+
+  /* The DisassembleInfo object we are disassembling for.  */
+  disasm_info_object *m_disasm_info_object;
+
+  /* When the user indicates that a memory error has occurred then the
+     address of the memory error is stored in here.  */
+  gdb::optional<CORE_ADDR> m_memory_error_address;
+
+  /* When the user calls the builtin_disassemble function, if they pass a
+     memory source object then a pointer to the object is placed in here,
+     otherwise, this field is nullptr.  */
+  PyObject *m_memory_source;
+};
+
+/* Return true if OBJ is still valid, otherwise, return false.  A valid OBJ
+   will have a non-nullptr gdb_info field.  */
+
+static bool
+disasm_info_object_is_valid (disasm_info_object *obj)
+{
+  return obj->gdb_info != nullptr;
+}
+
+/* Implement DisassembleInfo.is_valid(), really just a wrapper around the
+   disasm_info_object_is_valid function above.  */
+
+static PyObject *
+disasmpy_info_is_valid (PyObject *self, PyObject *args)
+{
+  disasm_info_object *disasm_obj = (disasm_info_object *) self;
+
+  if (disasm_info_object_is_valid (disasm_obj))
+    Py_RETURN_TRUE;
+
+  Py_RETURN_FALSE;
+}
+
+/* Set the Python exception to be a gdb.MemoryError object, with ADDRESS
+   as its payload.  */
+
+static void
+disasmpy_set_memory_error_for_address (CORE_ADDR address)
+{
+  PyObject *address_obj = gdb_py_object_from_longest (address).release ();
+  PyErr_SetObject (gdbpy_gdb_memory_error, address_obj);
+}
+
+
+/* Ensure that a gdb.disassembler.DisassembleInfo is valid.  */
+#define DISASMPY_DISASM_INFO_REQUIRE_VALID(Info)			\
+  do {									\
+    if (!disasm_info_object_is_valid (Info))					\
+      {									\
+	PyErr_SetString (PyExc_RuntimeError,				\
+			 _("DisassembleInfo is no longer valid."));	\
+	return nullptr;							\
+      }									\
+  } while (0)
+
+/* Implement gdb.disassembler.builtin_disassemble().  Calls back into GDB's
+   builtin disassembler.  The first argument is a DisassembleInfo object
+   describing what to disassemble.  The second argument is optional and
+   provides a mechanism to modify the memory contents that the builtin
+   disassembler will actually disassemble.
+
+   Returns an instance of gdb.disassembler.DisassemblerResult, an object
+   that wraps a disassembled instruction, or it raises a
+   gdb.MemoryError.  */
+
+static PyObject *
+disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *info_obj, *memory_source_obj = nullptr;
+  static const char *keywords[] = { "info", "memory_source", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
+					&disasm_info_object_type, &info_obj,
+					&memory_source_obj))
+    return nullptr;
+
+  disasm_info_object *disasm_info = (disasm_info_object *) info_obj;
+  if (!disasm_info_object_is_valid (disasm_info))
+    {
+      PyErr_SetString (PyExc_RuntimeError,
+		       _("DisassembleInfo is no longer valid."));
+      return nullptr;
+    }
+
+  /* A memory source is any object that provides the 'read_memory'
+     callback.  At this point we only check for the existence of a
+     'read_memory' attribute, if this isn't callable then we'll throw an
+     exception from within gdbpy_disassembler::read_memory_func.  */
+  if (memory_source_obj != nullptr)
+    {
+      if (!PyObject_HasAttrString (memory_source_obj, "read_memory"))
+	{
+	  PyErr_SetString (PyExc_TypeError,
+			   _("memory_source doesn't have a read_memory method"));
+	  return nullptr;
+	}
+    }
+
+  /* Where the result will be written.  */
+  gdbpy_disassembler disassembler (disasm_info, memory_source_obj);
+
+  /* Now actually perform the disassembly.  */
+  int length
+    = gdbarch_print_insn (disasm_info->gdbarch, disasm_info->address,
+			  disassembler.disasm_info ());
+
+  if (length == -1)
+    {
+
+      /* In an ideal world, every disassembler should always call the
+	 memory error function before returning a status of -1 as the only
+	 error a disassembler should encounter is a failure to read
+	 memory.  Unfortunately, there are some disassemblers who don't
+	 follow this rule, and will return -1 without calling the memory
+	 error function.
+
+	 To make the Python API simpler, we just classify everything as a
+	 memory error, but the message has to be modified for the case
+	 where the disassembler didn't call the memory error function.  */
+      if (disassembler.memory_error_address ().has_value ())
+	{
+	  CORE_ADDR addr = *disassembler.memory_error_address ();
+	  disasmpy_set_memory_error_for_address (addr);
+	}
+      else
+	PyErr_Format (gdbpy_gdb_memory_error, "unknown disassembly error");
+      return nullptr;
+    }
+
+  /* Instructions are either non-zero in length, or we got an error,
+     indicated by a length of -1, which we handled above.  */
+  gdb_assert (length > 0);
+
+  /* We should not have seen a memory error in this case.  */
+  gdb_assert (!disassembler.memory_error_address ().has_value ());
+
+  /* Create an object to represent the result of the disassembler.  */
+  gdbpy_ref<disasm_result_object> res
+    (PyObject_New (disasm_result_object, &disasm_result_object_type));
+  res->length = length;
+  res->content = new string_file;
+  *(res->content) = disassembler.release ();
+
+  return reinterpret_cast<PyObject *> (res.release ());
+}
+
+/* Implement gdb.set_enabled function.  Takes a boolean parameter, and
+   sets whether GDB should enter the Python disassembler code or not.
+
+   This is called from within the Python code when a new disassembler is
+   registered.  When no disassemblers are registered the global C++ flag
+   is set to false, and GDB never even enters the Python environment to
+   check for a disassembler.
+
+   When the user registers a new Python disassembler, the global C++ flag
+   is set to true, and now GDB will enter the Python environment to check
+   if there's a disassembler registered for the current architecture.  */
+
+static PyObject *
+disasmpy_set_enabled (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *newstate;
+  static const char *keywords[] = { "state", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O", keywords,
+					&newstate))
+    return nullptr;
+
+  if (!PyBool_Check (newstate))
+    {
+      PyErr_SetString (PyExc_TypeError,
+		       _("The value passed to `_set_enabled' must be a boolean."));
+      return nullptr;
+    }
+
+  python_print_insn_enabled = PyObject_IsTrue (newstate);
+  Py_RETURN_NONE;
+}
+
+/* Implement DisassembleInfo.read_memory(LENGTH, OFFSET).  Read LENGTH
+   bytes at OFFSET from the start of the instruction currently being
+   disassembled, and return a memory buffer containing the bytes.
+
+   OFFSET defaults to zero if it is not provided.  LENGTH is required.  If
+   the read fails then this will raise a gdb.MemoryError exception.  */
+
+static PyObject *
+disasmpy_info_read_memory (PyObject *self, PyObject *args, PyObject *kw)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+
+  LONGEST length, offset = 0;
+  gdb::unique_xmalloc_ptr<gdb_byte> buffer;
+  static const char *keywords[] = { "length", "offset", nullptr };
+
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "L|L", keywords,
+					&length, &offset))
+    return nullptr;
+
+  /* The apparent address from which we are reading memory.  Note that in
+     some cases GDB actually disassembles instructions from a buffer, so
+     we might not actually be reading this information directly from the
+     inferior memory.  This is all hidden behind the read_memory_func API
+     within the disassemble_info structure.  */
+  CORE_ADDR address = obj->address + offset;
+
+  /* Setup a buffer to hold the result.  */
+  buffer.reset ((gdb_byte *) xmalloc (length));
+
+  /* Read content into BUFFER.  If the read fails then raise a memory
+     error, otherwise, convert BUFFER to a Python memory buffer, and return
+     it to the user.  */
+  disassemble_info *info = obj->gdb_info;
+  if (info->read_memory_func ((bfd_vma) address, buffer.get (),
+			      (unsigned int) length, info) != 0)
+    {
+      disasmpy_set_memory_error_for_address (address);
+      return nullptr;
+    }
+  return gdbpy_buffer_to_membuf (std::move (buffer), address, length);
+}
+
+/* Implement DisassembleInfo.address attribute, return the address at which
+   GDB would like an instruction disassembled.  */
+
+static PyObject *
+disasmpy_info_address (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdb_py_object_from_longest (obj->address).release ();
+}
+
+/* Implement DisassembleInfo.architecture attribute.  Return the
+   gdb.Architecture in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_architecture (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdbarch_to_arch_object (obj->gdbarch);
+}
+
+/* Implement DisassembleInfo.progspace attribute.  Return the
+   gdb.Progspace in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_progspace (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return pspace_to_pspace_object (obj->program_space).release ();
+}
+
+/* This implements the disassemble_info read_memory_func callback.  This
+   will either call the standard read memory function, or, if the user has
+   supplied a memory source (see disasmpy_builtin_disassemble) then this
+   will call back into Python to obtain the memory contents.
+
+   Read LEN bytes from MEMADDR and place them into BUFF.  Return 0 on
+   success (in which case BUFF has been filled), or -1 on error, in which
+   case the contents of BUFF are undefined.  */
+
+int
+gdbpy_disassembler::read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+				      unsigned int len,
+				      struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  disasm_info_object *obj = dis->py_disasm_info ();
+  PyObject *memory_source = dis->m_memory_source;
+
+  /* The simple case, the user didn't pass a separate memory source, so we
+     just delegate to the standard disassemble_info read_memory_func,
+     passing in the original disassemble_info object, which core GDB might
+     require in order to read the instruction bytes (when reading the
+     instruction from a buffer).  */
+  if (memory_source == nullptr)
+    return obj->gdb_info->read_memory_func (memaddr, buff, len, obj->gdb_info);
+
+  /* The user provided a separate memory source, we need to call the
+     read_memory method on the memory source and use the buffer it returns
+     as the bytes of memory.  */
+  LONGEST offset = (LONGEST) memaddr - (LONGEST) obj->address;
+  gdbpy_ref<> result_obj (PyObject_CallMethod (memory_source, "read_memory",
+					       "KL", len, offset));
+  if (result_obj == nullptr)
+    {
+      /* If we got a gdb.MemoryError then we ignore this and just report
+	 that the read failed to the caller.  The caller is then
+	 responsible for calling the memory_error_func if it wants to.
+	 Remember, the disassembler might just be probing to see if these
+	 bytes can be read, if we automatically call the memory error
+	 function, we can end up registering an error prematurely.  */
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	PyErr_Clear ();
+      else
+	gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Convert the result to a buffer.  */
+  Py_buffer py_buff;
+  if (!PyObject_CheckBuffer (result_obj.get ())
+      || PyObject_GetBuffer (result_obj.get(), &py_buff, PyBUF_CONTIG_RO) < 0)
+    {
+      PyErr_Format (PyExc_TypeError,
+		    _("Result from read_memory is not a buffer"));
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Wrap PY_BUFF so that it is cleaned up correctly at the end of this
+     scope.  */
+  Py_buffer_up buffer_up (&py_buff);
+
+  /* Validate that the buffer is the correct length.  */
+  if (py_buff.len != len)
+    {
+      PyErr_Format (PyExc_ValueError,
+		    _("Result from read_memory is incorrectly sized buffer"));
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Copy the data out of the Python buffer and return succsess.*/
+  const gdb_byte *buffer = (const gdb_byte *) py_buff.buf;
+  memcpy (buff, buffer, len);
+  return 0;
+}
+
+/* Implement DisassemblerResult.length attribute, return the length of the
+   disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_length (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  return gdb_py_object_from_longest (obj->length).release ();
+}
+
+/* Implement DisassemblerResult.string attribute, return the content string
+   of the disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_string (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+
+  gdb_assert (obj->content != nullptr);
+  gdb_assert (strlen (obj->content->c_str ()) > 0);
+  gdb_assert (obj->length > 0);
+  return PyUnicode_Decode (obj->content->c_str (),
+			   obj->content->size (),
+			   host_charset (), nullptr);
+}
+
+/* Implement DisassemblerResult.__init__.  Takes two arguments, an
+   integer, the length in bytes of the disassembled instruction, and a
+   string, the disassembled content of the instruction.  */
+
+static int
+disasmpy_result_init (PyObject *self, PyObject *args, PyObject *kwargs)
+{
+  static const char *keywords[] = { "length", "string", NULL };
+  int length;
+  const char *string;
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kwargs, "is", keywords,
+					&length, &string))
+    return -1;
+
+  if (length <= 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("Length must be greater than 0."));
+      return -1;
+    }
+
+  if (strlen (string) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("String must not be empty."));
+      return -1;
+    }
+
+  disasm_result_object *obj = (disasm_result_object *) self;
+  if (obj->content == nullptr)
+    obj->content = new string_file;
+  obj->length = length;
+  obj->content->write (string, strlen (string));
+  return 0;
+}
+
+/* Implement memory_error_func callback for disassemble_info.  Extract the
+   underlying DisassembleInfo Python object, and set a memory error on
+   it.  */
+
+void
+gdbpy_disassembler::memory_error_func (int status, bfd_vma memaddr,
+				       struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  dis->m_memory_error_address.emplace (memaddr);
+}
+
+/* Wrapper of print_address.  */
+
+void
+gdbpy_disassembler::print_address_func (bfd_vma addr,
+					struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  print_address (dis->arch (), addr, (struct ui_file *) info->stream);
+}
+
+/* constructor.  */
+
+gdbpy_disassembler::gdbpy_disassembler (disasm_info_object *obj,
+					PyObject *memory_source)
+  : gdb_printing_disassembler (obj->gdbarch, &m_string_file,
+			       read_memory_func, memory_error_func,
+			       print_address_func),
+    m_disasm_info_object (obj),
+    m_memory_source (memory_source)
+{ /* Nothing.  */ }
+
+/* A wrapper around a reference to a Python DisassembleInfo object, which
+   ensures that the object is marked as invalid when we leave the enclosing
+   scope.
+
+   Each DisassembleInfo is created in gdbpy_print_insn, and is done with by
+   the time that function returns.  However, there's nothing to stop a user
+   caching a reference to the DisassembleInfo, and thus keeping the object
+   around.
+
+   We therefore have the notion of a DisassembleInfo becoming invalid, this
+   happens when gdbpy_print_insn returns.  This class is responsible for
+   marking the DisassembleInfo as invalid in its destructor.  */
+
+struct scoped_disasm_info_object
+{
+  /* Constructor.  */
+  scoped_disasm_info_object (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+			     disassemble_info *info)
+    : m_disasm_info (allocate_disasm_info_object ())
+  {
+    m_disasm_info->address = memaddr;
+    m_disasm_info->gdb_info = info;
+    m_disasm_info->gdbarch = gdbarch;
+    m_disasm_info->program_space = current_program_space;
+  }
+
+  /* Upon destruction mark m_diasm_info as invalid.  */
+  ~scoped_disasm_info_object ()
+  {
+    m_disasm_info->gdb_info = nullptr;
+  }
+
+  /* Return a pointer to the underlying disasm_info_object instance.  */
+  disasm_info_object *
+  get () const
+  {
+    return m_disasm_info.get ();
+  }
+
+private:
+
+  /* Wrapper around the call to PyObject_New, this wrapper function can be
+     called from the constructor initialization list, while PyObject_New, a
+     macro, can't.  */
+  static disasm_info_object *
+  allocate_disasm_info_object ()
+  {
+    return (disasm_info_object *) PyObject_New (disasm_info_object,
+						&disasm_info_object_type);
+  }
+
+  /* A reference to a gdb.disassembler.DisassembleInfo object.  When this
+     containing instance goes out of scope this reference is released,
+     however, the user might be holding other references to the
+     DisassembleInfo object in Python code, so the underlying object might
+     not be deleted.  */
+  gdbpy_ref<disasm_info_object> m_disasm_info;
+};
+
+/* See python-internal.h.  */
+
+gdb::optional<int>
+gdbpy_print_insn (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+		  disassemble_info *info)
+{
+  /* Early exit case.  This must be done as early as possible, and
+     definitely before we enter Python environment.  The
+     python_print_insn_enabled flag is set (from Python) only when the user
+     has installed one (or more) Python disassemblers.  So in the common
+     case (no custom disassembler installed) this flag will be false,
+     allowing for a quick return.  */
+  if (!gdb_python_initialized || !python_print_insn_enabled)
+    return {};
+
+  gdbpy_enter enter_py (get_current_arch (), current_language);
+
+  /* The attribute we are going to lookup that provides the print_insn
+     functionality.  */
+  static const char *callback_name = "_print_insn";
+
+  /* Grab a reference to the gdb.disassembler module, and check it has the
+     attribute that we need.  */
+  gdbpy_ref<> gdb_python_disassembler_module
+    (PyImport_ImportModule ("gdb.disassembler"));
+  if (gdb_python_disassembler_module == nullptr
+      || !PyObject_HasAttrString (gdb_python_disassembler_module.get (),
+				  callback_name))
+    return {};
+
+  /* Now grab the callback attribute from the module.  */
+  gdbpy_ref<> hook
+    (PyObject_GetAttrString (gdb_python_disassembler_module.get (),
+			     callback_name));
+  if (hook == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Create the new DisassembleInfo object we will pass into Python.  This
+     object will be marked as invalid when we leave this scope.  */
+  scoped_disasm_info_object scoped_disasm_info (gdbarch, memaddr, info);
+  disasm_info_object *disasm_info = scoped_disasm_info.get ();
+
+  /* Call into the registered disassembler to (possibly) perform the
+     disassembly.  */
+  PyObject *insn_disas_obj = (PyObject *) disasm_info;
+  gdbpy_ref<> result (PyObject_CallFunctionObjArgs (hook.get (),
+						    insn_disas_obj,
+						    nullptr));
+
+  if (result == nullptr)
+    {
+      /* The call into Python code resulted in an exception.  If this was a
+	 gdb.MemoryError, then we can figure out an address and call the
+	 disassemble_info::memory_error_func to report the error back to
+	 core GDB.  Any other exception type we assume means a bug in the
+	 user's code, and print stack.  */
+
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	{
+	  /* A gdb.MemoryError might have an address attribute which
+	     contains the address at which the memory error occurred.  If
+	     this is the case then use this address, otherwise, fallback to
+	     just using the address of the instruction we were asked to
+	     disassemble.  */
+	  PyObject *error_type, *error_value, *error_traceback;
+	  CORE_ADDR addr;
+
+	  PyErr_Fetch (&error_type, &error_value, &error_traceback);
+
+	  if (error_value != nullptr
+	      && PyObject_HasAttrString (error_value, "address"))
+	    {
+	      PyObject *addr_obj = PyObject_GetAttrString (error_value,
+							   "address");
+	      if (get_addr_from_python (addr_obj, &addr) < 0)
+		addr = disasm_info->address;
+	    }
+	  else
+	    addr = disasm_info->address;
+
+	  PyErr_Clear ();
+	  info->memory_error_func (-1, addr, info);
+	  return gdb::optional<int> (-1);
+	}
+      else
+	{
+	  /* Anything that is not gdb.MemoryError.  */
+	  gdbpy_print_stack ();
+	  return {};
+	}
+    }
+  else if (result == Py_None)
+    {
+      /* A return value of None indicates that the Python code could not,
+	 or doesn't want to, disassemble this instruction.  Just return an
+	 empty result and core GDB will try to disassemble this for us.  */
+      return {};
+    }
+
+  /* The call into Python neither raised an exception, or returned None.
+     Check to see if the result looks valid.  */
+  gdbpy_ref<> length_obj (PyObject_GetAttrString (result.get (), "length"));
+  if (length_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  gdbpy_ref<> string_obj (PyObject_GetAttrString (result.get (), "string"));
+  if (string_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+  if (!gdbpy_is_string (string_obj.get ()))
+    {
+      PyErr_SetString (PyExc_TypeError, _("string attribute is not a string."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  gdb::unique_xmalloc_ptr<char> string
+    = gdbpy_obj_to_string (string_obj.get ());
+  if (string == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  long length;
+  if (!gdb_py_int_as_long (length_obj.get (), &length))
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  long max_insn_length = (gdbarch_max_insn_length_p (gdbarch) ?
+			  gdbarch_max_insn_length (gdbarch) : INT_MAX);
+  if (length <= 0 || length > max_insn_length)
+    {
+      PyErr_SetString (PyExc_ValueError, _("Invalid length attribute."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  if (strlen (string.get ()) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError, _("string attribute must not be empty."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Print the disassembled instruction back to core GDB, and return the
+     length of the disassembled instruction.  */
+  info->fprintf_func (info->stream, "%s", string.get ());
+  return gdb::optional<int> (length);
+}
+
+/* The tp_dealloc callback for the DisassemblerResult type.  Takes care of
+   deallocating the content buffer.  */
+
+static void
+disasmpy_dealloc_result (PyObject *self)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  delete obj->content;
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* The get/set attributes of the gdb.disassembler.DisassembleInfo type.  */
+
+static gdb_PyGetSetDef disasm_info_object_getset[] = {
+  { "address", disasmpy_info_address, nullptr,
+    "Start address of the instruction to disassemble.", nullptr },
+  { "architecture", disasmpy_info_architecture, nullptr,
+    "Architecture to disassemble in", nullptr },
+  { "progspace", disasmpy_info_progspace, nullptr,
+    "Program space to disassemble in", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* The methods of the gdb.disassembler.DisassembleInfo type.  */
+
+static PyMethodDef disasm_info_object_methods[] = {
+  { "read_memory", (PyCFunction) disasmpy_info_read_memory,
+    METH_VARARGS | METH_KEYWORDS,
+    "read_memory (LEN, OFFSET = 0) -> Octets[]\n\
+Read LEN octets for the instruction to disassemble." },
+  { "is_valid", disasmpy_info_is_valid, METH_NOARGS,
+    "is_valid () -> Boolean.\n\
+Return true if this DisassembleInfo is valid, false if not." },
+  {nullptr}  /* Sentinel */
+};
+
+/* The get/set attributes of the gdb.disassembler.DisassemblerResult type.  */
+
+static gdb_PyGetSetDef disasm_result_object_getset[] = {
+  { "length", disasmpy_result_length, nullptr,
+    "Length of the disassembled instruction.", nullptr },
+  { "string", disasmpy_result_string, nullptr,
+    "String representing the disassembled instruction.", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* These are the methods we add into the _gdb.disassembler module, which
+   are then imported into the gdb.disassembler module.  These are global
+   functions that support performing disassembly.  */
+
+PyMethodDef python_disassembler_methods[] =
+{
+  { "builtin_disassemble", (PyCFunction) disasmpy_builtin_disassemble,
+    METH_VARARGS | METH_KEYWORDS,
+    "builtin_disassemble (INFO, MEMORY_SOURCE = None) -> None\n\
+Disassemble using GDB's builtin disassembler.  INFO is an instance of\n\
+gdb.disassembler.DisassembleInfo.  The MEMORY_SOURCE, if not None, should\n\
+be an object with the read_memory method." },
+  { "_set_enabled", (PyCFunction) disasmpy_set_enabled,
+    METH_VARARGS | METH_KEYWORDS,
+    "_set_enabled (STATE) -> None\n\
+Set whether GDB should call into the Python _print_insn code or not." },
+  {nullptr, nullptr, 0, nullptr}
+};
+
+/* Structure to define the _gdb.disassembler module.  */
+
+static struct PyModuleDef python_disassembler_module_def =
+{
+  PyModuleDef_HEAD_INIT,
+  "_gdb.disassembler",
+  nullptr,
+  -1,
+  python_disassembler_methods,
+  nullptr,
+  nullptr,
+  nullptr,
+  nullptr
+};
+
+/* Called to initialize the Python structures in this file.  */
+
+int
+gdbpy_initialize_disasm
+(void)
+{
+  /* Create the _gdb.disassembler module, and add it to the _gdb module.  */
+
+  PyObject *gdb_disassembler_module;
+  gdb_disassembler_module = PyModule_Create (&python_disassembler_module_def);
+  if (gdb_disassembler_module == nullptr)
+    return -1;
+  PyModule_AddObject(gdb_module, "disassembler", gdb_disassembler_module);
+
+  /* This is needed so that 'import _gdb.disassembler' will work.  */
+  PyObject *dict = PyImport_GetModuleDict ();
+  PyDict_SetItemString (dict, "_gdb.disassembler", gdb_disassembler_module);
+
+  /* Having the tp_new field as nullptr means that this class can't be
+     created from user code.  The only way they can be created is from
+     within GDB, and then they are passed into user code.  */
+  gdb_assert (disasm_info_object_type.tp_new == nullptr);
+  if (PyType_Ready (&disasm_info_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassembleInfo",
+			      (PyObject *) &disasm_info_object_type) < 0)
+    return -1;
+
+  disasm_result_object_type.tp_new = PyType_GenericNew;
+  if (PyType_Ready (&disasm_result_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassemblerResult",
+			      (PyObject *) &disasm_result_object_type) < 0)
+    return -1;
+
+  return 0;
+}
+
+/* Describe the gdb.disassembler.DisassembleInfo type.  */
+
+PyTypeObject disasm_info_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassembleInfo",		/*tp_name*/
+  sizeof (disasm_info_object),			/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  0,						/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT,				/*tp_flags*/
+  "GDB instruction disassembler object",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  disasm_info_object_methods,			/* tp_methods */
+  0,						/* tp_members */
+  disasm_info_object_getset			/* tp_getset */
+};
+
+/* Describe the gdb.disassembler.DisassemblerResult type.  */
+
+PyTypeObject disasm_result_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassemblerResult",	/*tp_name*/
+  sizeof (disasm_result_object),		/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  disasmpy_dealloc_result,			/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,	/*tp_flags*/
+  "GDB object, representing a disassembler result",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  0,						/* tp_methods */
+  0,						/* tp_members */
+  disasm_result_object_getset,			/* tp_getset */
+  0,						/* tp_base */
+  0,						/* tp_dict */
+  0,						/* tp_descr_get */
+  0,						/* tp_descr_set */
+  0,						/* tp_dictoffset */
+  disasmpy_result_init,				/* tp_init */
+  0,						/* tp_alloc */
+};
diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
index d947b96033b..ed5894c1c3d 100644
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@@ -540,6 +540,8 @@ int gdbpy_initialize_connection ()
 int gdbpy_initialize_micommands (void)
   CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 void gdbpy_finalize_micommands ();
+int gdbpy_initialize_disasm ()
+  CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 
 /* A wrapper for PyErr_Fetch that handles reference counting for the
    caller.  */
@@ -822,4 +824,18 @@ extern bool gdbpy_is_architecture (PyObject *obj);
 
 extern bool gdbpy_is_progspace (PyObject *obj);
 
+/* Implement the 'print_insn' hook for Python.  Disassemble an instruction
+   whose address is ADDRESS for architecture GDBARCH.  The bytes of the
+   instruction should be read with INFO->read_memory_func as the
+   instruction being disassembled might actually be in a buffer.
+
+   Used INFO->fprintf_func to print the results of the disassembly, and
+   return the length of the instruction in octets.
+
+   If no instruction can be disassembled then return an empty value.  */
+
+extern gdb::optional<int> gdbpy_print_insn (struct gdbarch *gdbarch,
+					    CORE_ADDR address,
+					    disassemble_info *info);
+
 #endif /* PYTHON_PYTHON_INTERNAL_H */
diff --git a/gdb/python/python.c b/gdb/python/python.c
index faa69d6fdfe..66caef07397 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -167,7 +167,7 @@ static const struct extension_language_ops python_extension_ops =
 
   gdbpy_colorize_disasm,
 
-  NULL, /* gdbpy_print_insn, */
+  gdbpy_print_insn,
 };
 
 #endif /* HAVE_PYTHON */
@@ -2045,6 +2045,7 @@ do_start_initialization ()
 
   if (gdbpy_initialize_auto_load () < 0
       || gdbpy_initialize_values () < 0
+      || gdbpy_initialize_disasm () < 0
       || gdbpy_initialize_frames () < 0
       || gdbpy_initialize_commands () < 0
       || gdbpy_initialize_instruction () < 0
diff --git a/gdb/testsuite/gdb.python/py-disasm.c b/gdb/testsuite/gdb.python/py-disasm.c
new file mode 100644
index 00000000000..ee0bb157f4d
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.c
@@ -0,0 +1,25 @@
+/* This test program is part of GDB, the GNU debugger.
+
+   Copyright 2021-2022 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+int
+main ()
+{
+  asm ("nop");
+  asm ("nop");	/* Break here.  */
+  asm ("nop");
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.exp b/gdb/testsuite/gdb.python/py-disasm.exp
new file mode 100644
index 00000000000..ea7847fc6df
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.exp
@@ -0,0 +1,150 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This file is part of the GDB testsuite.  It validates the Python
+# disassembler API.
+
+load_lib gdb-python.exp
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile} "debug"] } {
+    return -1
+}
+
+# Skip all tests if Python scripting is not enabled.
+if { [skip_python_tests] } { continue }
+
+if ![runto_main] then {
+    fail "can't run to main"
+    return 0
+}
+
+set pyfile [gdb_remote_download host ${srcdir}/${subdir}/${testfile}.py]
+
+gdb_test "source ${pyfile}" "Python script imported" \
+         "import python scripts"
+
+gdb_breakpoint [gdb_get_line_number "Break here."]
+gdb_continue_to_breakpoint "Break here."
+
+set curr_pc [get_valueof "/x" "\$pc" "*unknown*"]
+
+gdb_test_no_output "python current_pc = ${curr_pc}"
+
+# The current pc will be something like 0x1234 with no leading zeros.
+# However, in the disassembler output addresses are padded with zeros.
+# This substitution changes 0x1234 to 0x0*1234, which can then be used
+# as a regexp in the disassembler output matching.
+set curr_pc_pattern [string replace ${curr_pc} 0 1 "0x0*"]
+
+# Grab the name of the current architecture, this is used in the tests
+# patterns below.
+set curr_arch [get_python_valueof "gdb.selected_inferior().architecture().name()" "*unknown*"]
+
+# Helper proc that removes all registered disassemblers.
+proc py_remove_all_disassemblers {} {
+    gdb_test_no_output "python remove_all_python_disassemblers()"
+}
+
+# A list of test plans.  Each plan is a list of two elements, the
+# first element is the name of a class in py-disasm.py, this is a
+# disassembler class.  The second element is a pattern that should be
+# matched in the disassembler output.
+#
+# Each different disassembler tests some different feature of the
+# Python disassembler API.
+set addr_pattern "\r\n=> ${curr_pc_pattern} <\[^>\]+>:\\s+"
+set base_pattern "${addr_pattern}nop"
+set test_plans \
+    [list \
+	 [list "" "${base_pattern}\r\n.*"] \
+	 [list "GlobalNullDisassembler" "${base_pattern}\r\n.*"] \
+	 [list "GlobalPreInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalPostInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalReadDisassembler" "${base_pattern}\\s+## bytes =( $hex)+\r\n.*"] \
+	 [list "GlobalAddrDisassembler" "${base_pattern}\\s+## addr = ${curr_pc_pattern} <\[^>\]+>\r\n.*"] \
+	 [list "NonMemoryErrorEarlyDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: non-memory error instead of a result\r\nnop\r\n.*"] \
+	 [list "NonMemoryErrorLateDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: non-memory error after builtin disassembler\r\nnop\r\n.*"] \
+	 [list "MemoryErrorEarlyDisassembler" "${base_pattern}\\s+## AFTER ERROR\r\n.*"] \
+	 [list "MemoryErrorLateDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "RethrowMemoryErrorDisassembler" "${addr_pattern}Cannot access memory at address $hex"] \
+	 [list "FaultingMemorySourceDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "FailingMemorySourceDisassembler" "${addr_pattern}Python Exception <class 'gdb.GdbError'>: the memory source failed\r\n\r\nCannot access memory at address ${curr_pc_pattern}"]]
+
+# Now execute each test plan.
+foreach plan $test_plans {
+    set global_disassembler_name [lindex $plan 0]
+    set expected_pattern [lindex $plan 1]
+
+    with_test_prefix "global_disassembler=${global_disassembler_name}" {
+	# Remove all existing disassemblers.
+	py_remove_all_disassemblers
+
+	# If we have a disassembler to load, do it now.
+	if { $global_disassembler_name != "" } {
+	    gdb_test_no_output "python add_global_disassembler($global_disassembler_name)"
+	}
+
+	# Disassemble main, and check the disassembler output.
+	gdb_test "disassemble main" $expected_pattern
+    }
+}
+
+# Check that the architecture specific disassemblers can override the
+# global disassembler.
+#
+# First, register a global disassembler, and check it is in place.
+with_test_prefix "GLOBAL tagging disassembler" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"GLOBAL\"), None)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Now register an architecture specific disassembler, and check it
+# overrides the global disassembler.
+with_test_prefix "LOCAL tagging disassembler" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"LOCAL\"), \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = LOCAL\r\n.*"
+}
+
+# Now remove the architecture specific disassembler, and check that
+# the global disassembler kicks back in.
+with_test_prefix "GLOBAL tagging disassembler again" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(None, \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Check that a DisassembleInfo becomes invalid after the call into the
+# disassembler.
+with_test_prefix "DisassembleInfo becomes invalid" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python add_global_disassembler(GlobalCachingDisassembler)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## CACHED\r\n.*"
+    gdb_test "python GlobalCachingDisassembler.check()" "PASS"
+}
+
+# Test the memory source aspect of the builtin disassembler.
+with_test_prefix "memory source api" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(analyzing_disassembler)"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*"
+    gdb_test "python analyzing_disassembler.find_replacement_candidate()" \
+	"Replace from $hex to $hex with NOP"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*" \
+	"second disassembler pass"
+    gdb_test "python analyzing_disassembler.check()" \
+	"PASS"
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.py b/gdb/testsuite/gdb.python/py-disasm.py
new file mode 100644
index 00000000000..a05244dbb1b
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.py
@@ -0,0 +1,456 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import gdb
+import gdb.disassembler
+import struct
+import sys
+
+from gdb.disassembler import Disassembler, DisassemblerResult
+
+# A global, holds the program-counter address at which we should
+# perform the extra disassembly that this script provides.
+current_pc = None
+
+
+def remove_all_python_disassemblers():
+    for a in gdb.architecture_names():
+        gdb.disassembler.register_disassembler(None, a)
+    gdb.disassembler.register_disassembler(None, None)
+
+
+class TestDisassembler(Disassembler):
+    """A base class for disassemblers within this script to inherit from.
+    Implements the __call__ method and ensures we only do any
+    disassembly wrapping for the global CURRENT_PC."""
+
+    def __init__(self):
+        global current_pc
+
+        super(TestDisassembler, self).__init__("TestDisassembler")
+        if current_pc == None:
+            raise gdb.GdbError("no current_pc set")
+
+    def __call__(self, info):
+        global current_pc
+
+        if info.address != current_pc:
+            return None
+        return self.disassemble(info)
+
+    def disassemble(self, info):
+        raise NotImplementedError("override the disassemble method")
+
+
+class GlobalPreInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo before disassembly has occurred."""
+
+    def disassemble(self, info):
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalPostInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo after disassembly has occurred."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalReadDisassembler(TestDisassembler):
+    """Check the DisassembleInfo.read_memory method.  Calls the builtin
+    disassembler, then reads all of the bytes of this instruction, and
+    adds them as a comment to the disassembler output."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        len = result.length
+        str = ""
+        for o in range(len):
+            if str != "":
+                str += " "
+            v = bytes(info.read_memory(1, o))[0]
+            if sys.version_info[0] < 3:
+                v = struct.unpack("<B", v)
+            str += "0x%02x" % v
+        text = result.string + "\t## bytes = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalAddrDisassembler(TestDisassembler):
+    """Check the gdb.format_address method."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        arch = info.architecture
+        addr = info.address
+        program_space = info.progspace
+        str = gdb.format_address(addr, program_space, arch)
+        text = result.string + "\t## addr = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class NonMemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a non-memory error instead of performing any disassembly."""
+
+    def disassemble(self, info):
+        raise gdb.GdbError("non-memory error instead of a result")
+
+
+class NonMemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a non-memory error after calling the builtin disassembler."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        raise gdb.GdbError("non-memory error after builtin disassembler")
+
+
+class MemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a memory error, ignore the error and disassemble."""
+
+    def disassemble(self, info):
+        tag = "## FAIL"
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError:
+            tag = "## AFTER ERROR"
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t" + tag
+        return DisassemblerResult(result.length, text)
+
+
+class MemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a memory error after calling the builtin disassembler, but
+    before we return a result."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        # The following read will throw an error.
+        info.read_memory(1, -info.address + 2)
+        return DisassemblerResult(1, "BAD")
+
+
+class RethrowMemoryErrorDisassembler(TestDisassembler):
+    """Catch and rethrow a memory error."""
+
+    def disassemble(self, info):
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError as e:
+            raise gdb.MemoryError("cannot read code at address 0x2")
+        return DisassemblerResult(1, "BAD")
+
+
+class TaggingDisassembler(TestDisassembler):
+    """A simple disassembler that just tags the output."""
+
+    def __init__(self, tag):
+        super(TaggingDisassembler, self).__init__()
+        self._tag = tag
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## tag = %s" % self._tag
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalCachingDisassembler(TestDisassembler):
+    """A disassembler that caches the DisassembleInfo that is passed in. Once
+    the call into the disassembler is complete then the DisassembleInfo
+    becomes invalid, and any calls into it should trigger an
+    exception."""
+
+    # This is where we cache the DisassembleInfo object.
+    cached_insn_disas = None
+
+    def disassemble(self, info):
+        """Disassemble the instruction, add a CACHED comment to the output,
+        and cache the DisassembleInfo so that it is not garbage collected."""
+        GlobalCachingDisassembler.cached_insn_disas = info
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## CACHED"
+        return DisassemblerResult(result.length, text)
+
+    @staticmethod
+    def check():
+        """Check that all of the methods on the cached DisassembleInfo trigger an
+        exception."""
+        info = GlobalCachingDisassembler.cached_insn_disas
+        assert isinstance(info, gdb.disassembler.DisassembleInfo)
+        assert not info.is_valid()
+        try:
+            val = info.address
+            raise gdb.GdbError("DisassembleInfo.address is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError("DisassembleInfo.address raised an unexpected exception")
+
+        try:
+            val = info.architecture
+            raise gdb.GdbError("DisassembleInfo.architecture is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError(
+                "DisassembleInfo.architecture raised an unexpected exception"
+            )
+
+        try:
+            val = info.read_memory(1, 0)
+            raise gdb.GdbError("DisassembleInfo.read is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError("DisassembleInfo.read raised an unexpected exception")
+
+        print("PASS")
+
+
+class GlobalNullDisassembler(TestDisassembler):
+    """A disassembler that does not change the output at all."""
+
+    def disassemble(self, info):
+        pass
+
+
+class FaultingMemorySourceDisassembler(TestDisassembler):
+    """Throw a memory error from the memory source read_memory method."""
+
+    def disassemble(self, info):
+        return gdb.disassembler.builtin_disassemble(info, self)
+
+    def read_memory(self, len, offset):
+        # Throw a memory error with a specific address.  We don't
+        # expect this address to show up in the output though.
+        raise gdb.MemoryError(0x1234)
+
+
+class FailingMemorySourceDisassembler(TestDisassembler):
+    """Throw a non-memory error from the memory source read_memory method."""
+
+    def disassemble(self, info):
+        return gdb.disassembler.builtin_disassemble(info, self)
+
+    def read_memory(self, len, offset):
+        raise gdb.GdbError("the memory source failed")
+
+
+class AnalyzingDisassembler(Disassembler):
+    def __init__(self, name):
+        """Constructor."""
+        super(AnalyzingDisassembler, self).__init__(name)
+
+        # Details about the instructions found during the first disassembler
+        # pass.
+        self._pass_1_length = []
+        self._pass_1_insn = []
+        self._pass_1_address = []
+
+        # The start and end address for the instruction we will replace with
+        # one or more 'nop' instructions during pass two.
+        self._start = None
+        self._end = None
+
+        # The index in the _pass_1_* lists for where the nop instruction can
+        # be found, also, the buffer of bytes that make up a nop instruction.
+        self._nop_index = None
+        self._nop_bytes = None
+
+        # The DisassembleInfo object passed into __call__ as INFO.
+        self._info = None
+
+        # A flag that indicates if we are in the first or second pass of
+        # this disassembler test.
+        self._first_pass = True
+
+        # The disassembled instructions collected during the second pass.
+        self._pass_2_insn = []
+
+        # A copy of _pass_1_insn that has been modified to include the extra
+        # 'nop' instructions we plan to insert during the second pass.  This
+        # is then checked against _pass_2_insn after the second disassembler
+        # pass has completed.
+        self._check = []
+
+    def __call__(self, info):
+        """Called to perform the disassembly."""
+
+        # Record INFO, we'll need to refer to this in READ_MEMORY which is
+        # called back to by the builtin disassembler.
+        self._info = info
+        result = gdb.disassembler.builtin_disassemble(info, self)
+
+        # Record some informaiton about the first 'nop' instruction we find.
+        if self._nop_index is None and result.string == "nop":
+            self._nop_index = len(self._pass_1_length)
+            # The offset in the following read_memory call defaults to 0.
+            self._nop_bytes = info.read_memory(result.length)
+
+        # Record information about each instruction that is disassembled.
+        # This test is performed in two passes, and we need different
+        # information in each pass.
+        if self._first_pass:
+            self._pass_1_length.append(result.length)
+            self._pass_1_insn.append(result.string)
+            self._pass_1_address.append(info.address)
+        else:
+            self._pass_2_insn.append(result.string)
+
+        return result
+
+    def _read_replacement(self, length, offset):
+        """Return a slice of the buffer representing the replacement nop
+        instructions."""
+
+        assert self._nop_bytes is not None
+        rb = self._nop_bytes
+
+        # If this request is outside of a nop instruction then we don't know
+        # what to do, so just raise a memory error.
+        if offset >= len(rb) or (offset + length) > len(rb):
+            raise gdb.MemoryError("invalid length and offset combination")
+
+        # Return only the slice of the nop instruction as requested.
+        s = offset
+        e = offset + length
+        return rb[s:e]
+
+    def read_memory(self, len, offset):
+        """Callback used from the builtin disassembler to read the contents of
+        memory."""
+
+        info = self._info
+        assert info is not None
+
+        # If this request is within the region we are replacing with 'nop'
+        # instructions, then call the helper function to perform that
+        # replacement.
+        if self._start is not None:
+            assert self._end is not None
+            if info.address >= self._start and info.address < self._end:
+                return self._read_replacement(len, offset)
+
+        # Otherwise, we just forward this request to the default read memory
+        # implementation.
+        return info.read_memory(len, offset)
+
+    def find_replacement_candidate(self):
+        """Call this after the first disassembly pass.  This identifies a suitable
+        instruction to replace with 'nop' instruction(s)."""
+
+        if self._nop_index is None:
+            raise gdb.GdbError("no nop was found")
+
+        nop_idx = self._nop_index
+        nop_length = self._pass_1_length[nop_idx]
+
+        # First we look for an instruction that is larger than a nop
+        # instruction, but whose length is an exact multiple of the nop
+        # instruction's length.
+        replace_idx = None
+        for idx in range(len(self._pass_1_length)):
+            if (
+                idx > 0
+                and idx != nop_idx
+                and self._pass_1_insn[idx] != "nop"
+                and self._pass_1_length[idx] > self._pass_1_length[nop_idx]
+                and self._pass_1_length[idx] % self._pass_1_length[nop_idx] == 0
+            ):
+                replace_idx = idx
+                break
+
+        # If we still don't have a replacement candidate, then search again,
+        # this time looking for an instruciton that is the same length as a
+        # nop instruction.
+        if replace_idx is None:
+            for idx in range(len(self._pass_1_length)):
+                if (
+                    idx > 0
+                    and idx != nop_idx
+                    and self._pass_1_insn[idx] != "nop"
+                    and self._pass_1_length[idx] == self._pass_1_length[nop_idx]
+                ):
+                    replace_idx = idx
+                    break
+
+        # Weird, the nop instruction must be larger than every other
+        # instruction, or all instructions are 'nop'?
+        if replace_idx is None:
+            raise gdb.GdbError("can't find an instruction to replace")
+
+        # Record the instruction range that will be replaced with 'nop'
+        # instructions, and mark that we are now on the second pass.
+        self._start = self._pass_1_address[replace_idx]
+        self._end = self._pass_1_address[replace_idx] + self._pass_1_length[replace_idx]
+        self._first_pass = False
+        print("Replace from 0x%x to 0x%x with NOP" % (self._start, self._end))
+
+        # Finally, build the expected result.  Create the _check list, which
+        # is a copy of _pass_1_insn, but replace the instruction we
+        # identified above with a series of 'nop' instructions.
+        self._check = list(self._pass_1_insn)
+        nop_count = int(self._pass_1_length[replace_idx] / self._pass_1_length[nop_idx])
+        nops = ["nop"] * nop_count
+        self._check[replace_idx : (replace_idx + 1)] = nops
+
+    def check(self):
+        """Call this after the second disassembler pass to validate the output."""
+        if self._check != self._pass_2_insn:
+            print("APB, Check : %s" % self._check)
+            print("APB, Result: %s" % self._pass_2_insn)
+            raise gdb.GdbError("mismatch")
+        print("PASS")
+
+
+# Create a global instance of the AnalyzingDisassembler.  This isn't
+# registered as a disassembler yet though, that is done from the
+# py-diasm.exp later.
+analyzing_disassembler = AnalyzingDisassembler("AnalyzingDisassembler")
+
+
+def add_global_disassembler(dis_class):
+    """Create an instance of DIS_CLASS and register it as a global disassembler."""
+    dis = dis_class()
+    gdb.disassembler.register_disassembler(dis, None)
+
+
+# Start with all disassemblers removed.
+remove_all_python_disassemblers()
+
+print("Python script imported")
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv3 5/6] gdb: refactor the non-printing disassemblers
  2022-04-04 22:19   ` [PATCHv3 0/6] Add Python API for the disassembler Andrew Burgess
                       ` (3 preceding siblings ...)
  2022-04-04 22:19     ` [PATCHv3 4/6] gdb/python: implement the print_insn extension language hook Andrew Burgess
@ 2022-04-04 22:19     ` Andrew Burgess
  2022-04-04 22:19     ` [PATCHv3 6/6] gdb: unify two dis_asm_read_memory functions in disasm.c Andrew Burgess
  2022-04-25  9:15     ` [PATCHv4 0/5] Add Python API for the disassembler Andrew Burgess
  6 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-04-04 22:19 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

This commit started from an observation I made while working on some
other disassembler patches, that is, that the function
gdb_buffered_insn_length, is broken ... sort of.

I noticed that the gdb_buffered_insn_length function doesn't set up
the application data field if the disassemble_info structure.

Further, I noticed that some architectures, for example, ARM, require
that the application_data field be set, see gdb_print_insn_arm in
arm-tdep.c.

And so, if we ever use gdb_buffered_insn_length for ARM, then GDB will
likely crash.  Which is why I said only "sort of" broken.  Right now
we don't use gdb_buffered_insn_length with ARM, so maybe it isn't
broken yet?

Anyway to prove to myself that there was a problem here I extended the
disassembler self tests in disasm-selftests.c to include a test of
gdb_buffered_insn_length.  As I run the test for all architectures, I
do indeed see GDB crash for ARM.

To fix this we need gdb_buffered_insn_length to create a disassembler
that inherits from gdb_disassemble_info, but we also need this new
disassembler to not print anything.

And so, I introduce a new gdb_non_printing_disassembler class, this is
a disassembler that doesn't print anything to the output stream.

I then observed that both ARC and S12Z also create non-printing
disassemblers, but these are slightly different.  While the
disassembler in gdb_non_printing_disassembler reads the instruction
from a buffer, the ARC and S12Z disassemblers read from target memory
using target_read_code.

And so, I further split gdb_non_printing_disassembler into two
sub-classes, gdb_non_printing_memory_disassembler and
gdb_non_printing_buffer_disassembler.

The new selftests now pass, but otherwise, there should be no user
visible changes after this commit.
---
 gdb/arc-linux-tdep.c   | 15 +++----
 gdb/arc-tdep.c         | 29 +++-----------
 gdb/arc-tdep.h         |  5 ---
 gdb/disasm-selftests.c | 70 ++++++++++++++++++++++++++-------
 gdb/disasm.c           | 88 ++++++++++++++++++------------------------
 gdb/disasm.h           | 56 ++++++++++++++++++++++++---
 gdb/s12z-tdep.c        | 26 +------------
 7 files changed, 158 insertions(+), 131 deletions(-)

diff --git a/gdb/arc-linux-tdep.c b/gdb/arc-linux-tdep.c
index 1744b7544cd..b658921ed94 100644
--- a/gdb/arc-linux-tdep.c
+++ b/gdb/arc-linux-tdep.c
@@ -356,7 +356,7 @@ arc_linux_sw_breakpoint_from_kind (struct gdbarch *gdbarch,
    */
 
 static std::vector<CORE_ADDR>
-handle_atomic_sequence (arc_instruction insn, disassemble_info &di)
+handle_atomic_sequence (arc_instruction insn, disassemble_info *di)
 {
   const int atomic_seq_len = 24;    /* Instruction sequence length.  */
   std::vector<CORE_ADDR> next_pcs;
@@ -374,7 +374,7 @@ handle_atomic_sequence (arc_instruction insn, disassemble_info &di)
   for (int insn_count = 0; insn_count < atomic_seq_len; ++insn_count)
     {
       arc_insn_decode (arc_insn_get_linear_next_pc (insn),
-		       &di, arc_delayed_print_insn, &insn);
+		       di, arc_delayed_print_insn, &insn);
 
       if (insn.insn_class == BRCC)
         {
@@ -412,15 +412,15 @@ arc_linux_software_single_step (struct regcache *regcache)
 {
   struct gdbarch *gdbarch = regcache->arch ();
   arc_gdbarch_tdep *tdep = (arc_gdbarch_tdep *) gdbarch_tdep (gdbarch);
-  struct disassemble_info di = arc_disassemble_info (gdbarch);
+  struct gdb_non_printing_memory_disassembler dis (gdbarch);
 
   /* Read current instruction.  */
   struct arc_instruction curr_insn;
-  arc_insn_decode (regcache_read_pc (regcache), &di, arc_delayed_print_insn,
-		   &curr_insn);
+  arc_insn_decode (regcache_read_pc (regcache), dis.disasm_info (),
+		   arc_delayed_print_insn, &curr_insn);
 
   if (curr_insn.insn_class == LLOCK)
-    return handle_atomic_sequence (curr_insn, di);
+    return handle_atomic_sequence (curr_insn, dis.disasm_info ());
 
   CORE_ADDR next_pc = arc_insn_get_linear_next_pc (curr_insn);
   std::vector<CORE_ADDR> next_pcs;
@@ -431,7 +431,8 @@ arc_linux_software_single_step (struct regcache *regcache)
   if (curr_insn.has_delay_slot)
     {
       struct arc_instruction next_insn;
-      arc_insn_decode (next_pc, &di, arc_delayed_print_insn, &next_insn);
+      arc_insn_decode (next_pc, dis.disasm_info (), arc_delayed_print_insn,
+		       &next_insn);
       next_pcs.push_back (arc_insn_get_linear_next_pc (next_insn));
     }
   else
diff --git a/gdb/arc-tdep.c b/gdb/arc-tdep.c
index d6da2886c49..e4716c04b7f 100644
--- a/gdb/arc-tdep.c
+++ b/gdb/arc-tdep.c
@@ -1306,24 +1306,6 @@ arc_is_in_prologue (struct gdbarch *gdbarch, const struct arc_instruction &insn,
   return false;
 }
 
-/* See arc-tdep.h.  */
-
-struct disassemble_info
-arc_disassemble_info (struct gdbarch *gdbarch)
-{
-  struct disassemble_info di;
-  init_disassemble_info_for_no_printing (&di);
-  di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
-  di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
-  di.endian = gdbarch_byte_order (gdbarch);
-  di.read_memory_func = [](bfd_vma memaddr, gdb_byte *myaddr,
-			   unsigned int len, struct disassemble_info *info)
-    {
-      return target_read_code (memaddr, myaddr, len);
-    };
-  return di;
-}
-
 /* Analyze the prologue and update the corresponding frame cache for the frame
    unwinder for unwinding frames that doesn't have debug info.  In such
    situation GDB attempts to parse instructions in the prologue to understand
@@ -1394,9 +1376,10 @@ arc_analyze_prologue (struct gdbarch *gdbarch, const CORE_ADDR entrypoint,
   while (current_prologue_end < limit_pc)
     {
       struct arc_instruction insn;
-      struct disassemble_info di = arc_disassemble_info (gdbarch);
-      arc_insn_decode (current_prologue_end, &di, arc_delayed_print_insn,
-		       &insn);
+
+      struct gdb_non_printing_memory_disassembler dis (gdbarch);
+      arc_insn_decode (current_prologue_end, dis.disasm_info (),
+		       arc_delayed_print_insn, &insn);
 
       if (arc_debug)
 	arc_insn_dump (insn);
@@ -2477,8 +2460,8 @@ dump_arc_instruction_command (const char *args, int from_tty)
 
   CORE_ADDR address = value_as_address (val);
   struct arc_instruction insn;
-  struct disassemble_info di = arc_disassemble_info (target_gdbarch ());
-  arc_insn_decode (address, &di, arc_delayed_print_insn, &insn);
+  struct gdb_non_printing_memory_disassembler dis (target_gdbarch ());
+  arc_insn_decode (address, dis.disasm_info (), arc_delayed_print_insn, &insn);
   arc_insn_dump (insn);
 }
 
diff --git a/gdb/arc-tdep.h b/gdb/arc-tdep.h
index ceca003204f..53e5d8476fc 100644
--- a/gdb/arc-tdep.h
+++ b/gdb/arc-tdep.h
@@ -186,11 +186,6 @@ arc_arch_is_em (const struct bfd_arch_info* arch)
    can't be set to an actual NULL value - that would cause a crash.  */
 int arc_delayed_print_insn (bfd_vma addr, struct disassemble_info *info);
 
-/* Return properly initialized disassemble_info for ARC disassembler - it will
-   not print disassembled instructions to stderr.  */
-
-struct disassemble_info arc_disassemble_info (struct gdbarch *gdbarch);
-
 /* Get branch/jump target address for the INSN.  Note that this function
    returns branch target and doesn't evaluate if this branch is taken or not.
    For the indirect jumps value depends in register state, hence can change.
diff --git a/gdb/disasm-selftests.c b/gdb/disasm-selftests.c
index 928d26f7018..07586f04abd 100644
--- a/gdb/disasm-selftests.c
+++ b/gdb/disasm-selftests.c
@@ -25,13 +25,19 @@
 
 namespace selftests {
 
-/* Test disassembly of one instruction.  */
+/* Return a pointer to a buffer containing an instruction that can be
+   disassembled for architecture GDBARCH.  *LEN will be set to the length
+   of the returned buffer.
 
-static void
-print_one_insn_test (struct gdbarch *gdbarch)
+   If there's no known instruction to disassemble for GDBARCH (because we
+   haven't figured on out, not because no instructions exist) then nullptr
+   is returned, and *LEN is set to 0.  */
+
+static const gdb_byte *
+get_test_insn (struct gdbarch *gdbarch, size_t *len)
 {
-  size_t len = 0;
-  const gdb_byte *insn = NULL;
+  *len = 0;
+  const gdb_byte *insn = nullptr;
 
   switch (gdbarch_bfd_arch_info (gdbarch)->arch)
     {
@@ -40,34 +46,34 @@ print_one_insn_test (struct gdbarch *gdbarch)
       static const gdb_byte bfin_insn[] = {0x17, 0xe1, 0xff, 0xff};
 
       insn = bfin_insn;
-      len = sizeof (bfin_insn);
+      *len = sizeof (bfin_insn);
       break;
     case bfd_arch_arm:
       /* mov     r0, #0 */
       static const gdb_byte arm_insn[] = {0x0, 0x0, 0xa0, 0xe3};
 
       insn = arm_insn;
-      len = sizeof (arm_insn);
+      *len = sizeof (arm_insn);
       break;
     case bfd_arch_ia64:
     case bfd_arch_mep:
     case bfd_arch_mips:
     case bfd_arch_tic6x:
     case bfd_arch_xtensa:
-      return;
+      return insn;
     case bfd_arch_s390:
       /* nopr %r7 */
       static const gdb_byte s390_insn[] = {0x07, 0x07};
 
       insn = s390_insn;
-      len = sizeof (s390_insn);
+      *len = sizeof (s390_insn);
       break;
     case bfd_arch_xstormy16:
       /* nop */
       static const gdb_byte xstormy16_insn[] = {0x0, 0x0};
 
       insn = xstormy16_insn;
-      len = sizeof (xstormy16_insn);
+      *len = sizeof (xstormy16_insn);
       break;
     case bfd_arch_nios2:
     case bfd_arch_score:
@@ -78,13 +84,13 @@ print_one_insn_test (struct gdbarch *gdbarch)
       {
 	int bplen;
 	insn = gdbarch_sw_breakpoint_from_kind (gdbarch, 4, &bplen);
-	len = bplen;
+	*len = bplen;
       }
       break;
     case bfd_arch_arc:
       /* PR 21003 */
       if (gdbarch_bfd_arch_info (gdbarch)->mach == bfd_mach_arc_arc601)
-	return;
+	return insn;
       goto generic_case;
     case bfd_arch_i386:
       {
@@ -93,7 +99,7 @@ print_one_insn_test (struct gdbarch *gdbarch)
 	   opcodes rejects an attempt to disassemble for an arch with
 	   a 64-bit address size when bfd_vma is 32-bit.  */
 	if (info->bits_per_address > sizeof (bfd_vma) * CHAR_BIT)
-	  return;
+	  return insn;
       }
       /* fall through */
     default:
@@ -105,12 +111,26 @@ print_one_insn_test (struct gdbarch *gdbarch)
 	int bplen;
 
 	insn = gdbarch_sw_breakpoint_from_kind (gdbarch, kind, &bplen);
-	len = bplen;
+	*len = bplen;
 
 	break;
       }
     }
-  SELF_CHECK (len > 0);
+  SELF_CHECK (*len > 0);
+
+  return insn;
+}
+
+/* Test disassembly of one instruction.  */
+
+static void
+print_one_insn_test (struct gdbarch *gdbarch)
+{
+  size_t len;
+  const gdb_byte *insn = get_test_insn (gdbarch, &len);
+
+  if (insn == nullptr)
+    return;
 
   /* Test gdb_disassembler for a given gdbarch by reading data from a
      pre-allocated buffer.  If you want to see the disassembled
@@ -175,6 +195,24 @@ print_one_insn_test (struct gdbarch *gdbarch)
   SELF_CHECK (di.print_insn (0) == len);
 }
 
+/* Test the gdb_buffered_insn_length function.  */
+
+static void
+buffered_insn_length_test (struct gdbarch *gdbarch)
+{
+  size_t buf_len;
+  const gdb_byte *insn = get_test_insn (gdbarch, &buf_len);
+
+  if (insn == nullptr)
+    return;
+
+  CORE_ADDR insn_address = 0;
+  int calculated_len = gdb_buffered_insn_length (gdbarch, insn, buf_len,
+						 insn_address);
+
+  SELF_CHECK (calculated_len == buf_len);
+}
+
 /* Test disassembly on memory error.  */
 
 static void
@@ -235,4 +273,6 @@ _initialize_disasm_selftests ()
 					 selftests::print_one_insn_test);
   selftests::register_test_foreach_arch ("memory_error",
 					 selftests::memory_error_test);
+  selftests::register_test_foreach_arch ("buffered_insn_length",
+					 selftests::buffered_insn_length_test);
 }
diff --git a/gdb/disasm.c b/gdb/disasm.c
index 16e3c39b702..1dfa141b10b 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -996,66 +996,56 @@ gdb_insn_length (struct gdbarch *gdbarch, CORE_ADDR addr)
   return gdb_print_insn (gdbarch, addr, &null_stream, NULL);
 }
 
-/* An fprintf-function for use by the disassembler when we know we don't
-   want to print anything.  Always returns success.  */
+/* See disasm.h.  */
 
-static int ATTRIBUTE_PRINTF (2, 3)
-gdb_disasm_null_printf (void *stream, const char *format, ...)
+int
+gdb_non_printing_disassembler::null_fprintf_func (void *stream,
+						  const char *format, ...)
 {
   return 0;
 }
 
-/* An fprintf-function for use by the disassembler when we know we don't
-   want to print anything, and the disassembler is using style.  Always
-   returns success.  */
+/* See disasm.h.  */
 
-static int ATTRIBUTE_PRINTF (3, 4)
-gdb_disasm_null_styled_printf (void *stream,
-			       enum disassembler_style style,
-			       const char *format, ...)
+int
+gdb_non_printing_disassembler::null_fprintf_styled_func
+  (void *stream, enum disassembler_style style, const char *format, ...)
 {
   return 0;
 }
 
 /* See disasm.h.  */
 
-void
-init_disassemble_info_for_no_printing (struct disassemble_info *dinfo)
+int
+gdb_non_printing_memory_disassembler::dis_asm_read_memory
+  (bfd_vma memaddr, bfd_byte *myaddr, unsigned int length,
+   struct disassemble_info *dinfo)
 {
-  init_disassemble_info (dinfo, nullptr, gdb_disasm_null_printf,
-			 gdb_disasm_null_styled_printf);
+  return target_read_code (memaddr, myaddr, length);
 }
 
-/* Initialize a struct disassemble_info for gdb_buffered_insn_length.
-   Upon return, *DISASSEMBLER_OPTIONS_HOLDER owns the string pointed
-   to by DI.DISASSEMBLER_OPTIONS.  */
+/* A non-printing disassemble_info management class.  The disassemble_info
+   setup by this class will not print anything to the output stream (there
+   is no output stream), and the instruction to be disassembled will be
+   read from a buffer passed to the constructor.  */
 
-static void
-gdb_buffered_insn_length_init_dis (struct gdbarch *gdbarch,
-				   struct disassemble_info *di,
-				   const gdb_byte *insn, int max_len,
-				   CORE_ADDR addr,
-				   std::string *disassembler_options_holder)
+struct gdb_non_printing_buffer_disassembler
+  : public gdb_non_printing_disassembler
 {
-  init_disassemble_info_for_no_printing (di);
-
-  /* init_disassemble_info installs buffer_read_memory, etc.
-     so we don't need to do that here.
-     The cast is necessary until disassemble_info is const-ified.  */
-  di->buffer = (gdb_byte *) insn;
-  di->buffer_length = max_len;
-  di->buffer_vma = addr;
-
-  di->arch = gdbarch_bfd_arch_info (gdbarch)->arch;
-  di->mach = gdbarch_bfd_arch_info (gdbarch)->mach;
-  di->endian = gdbarch_byte_order (gdbarch);
-  di->endian_code = gdbarch_byte_order_for_code (gdbarch);
-
-  *disassembler_options_holder = get_all_disassembler_options (gdbarch);
-  if (!disassembler_options_holder->empty ())
-    di->disassembler_options = disassembler_options_holder->c_str ();
-  disassemble_init_for_target (di);
-}
+  /* Constructor.  GDBARCH is the architecture to disassemble for, BUFFER
+     contains the instruction to disassemble, and INSN_ADDRESS is the
+     address (in target memory) of the instruction to disassemble.  */
+  gdb_non_printing_buffer_disassembler (struct gdbarch *gdbarch,
+					gdb::array_view<const gdb_byte> buffer,
+					CORE_ADDR insn_address)
+    : gdb_non_printing_disassembler (gdbarch, nullptr)
+  {
+    /* The cast is necessary until disassemble_info is const-ified.  */
+    m_di.buffer = (gdb_byte *) buffer.data ();
+    m_di.buffer_length = buffer.size ();
+    m_di.buffer_vma = insn_address;
+  }
+};
 
 /* Return the length in bytes of INSN.  MAX_LEN is the size of the
    buffer containing INSN.  */
@@ -1064,14 +1054,10 @@ int
 gdb_buffered_insn_length (struct gdbarch *gdbarch,
 			  const gdb_byte *insn, int max_len, CORE_ADDR addr)
 {
-  struct disassemble_info di;
-  std::string disassembler_options_holder;
-
-  gdb_buffered_insn_length_init_dis (gdbarch, &di, insn, max_len, addr,
-				     &disassembler_options_holder);
-
-  int result = gdb_print_insn_1 (gdbarch, addr, &di);
-  disassemble_free_target (&di);
+  gdb::array_view<const gdb_byte> buffer
+    = gdb::make_array_view (insn, max_len);
+  gdb_non_printing_buffer_disassembler dis (gdbarch, buffer, addr);
+  int result = gdb_print_insn_1 (gdbarch, addr, dis.disasm_info ());
   return result;
 }
 
diff --git a/gdb/disasm.h b/gdb/disasm.h
index b3e40e2981e..6c1d7673b01 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -136,6 +136,56 @@ struct gdb_printing_disassembler : public gdb_disassemble_info
     ATTRIBUTE_PRINTF(3,4);
 };
 
+/* A basic disassembler that doesn't actually print anything.  */
+
+struct gdb_non_printing_disassembler : public gdb_disassemble_info
+{
+  gdb_non_printing_disassembler (struct gdbarch *gdbarch,
+				 read_memory_ftype read_memory_func)
+    : gdb_disassemble_info (gdbarch, nullptr /* stream */,
+			    read_memory_func,
+			    nullptr /* memory_error_func */,
+			    nullptr /* print_address_func */,
+			    null_fprintf_func,
+			    null_fprintf_styled_func)
+  { /* Nothing.  */ }
+
+private:
+
+  /* Callback used as the disassemble_info's fprintf_func callback, this
+     doesn't write anything to STREAM, but just returns 0.  */
+  static int null_fprintf_func (void *stream, const char *format, ...)
+    ATTRIBUTE_PRINTF(2,3);
+
+  /* Callback used as the disassemble_info's fprintf_styled_func callback,
+     , this doesn't write anything to STREAM, but just returns 0.  */
+  static int null_fprintf_styled_func (void *stream,
+				       enum disassembler_style style,
+				       const char *format, ...)
+    ATTRIBUTE_PRINTF(3,4);
+};
+
+/* A non-printing disassemble_info management class.  The disassemble_info
+   setup by this class will not print anything to the output stream (there
+   is no output stream), and the instruction to be disassembled will be
+   read from target memory.  */
+
+struct gdb_non_printing_memory_disassembler
+  : public gdb_non_printing_disassembler
+{
+  /* Constructor.  GDBARCH is the architecture to disassemble for.  */
+  gdb_non_printing_memory_disassembler (struct gdbarch *gdbarch)
+    :gdb_non_printing_disassembler (gdbarch, dis_asm_read_memory)
+  { /* Nothing.  */ }
+
+private:
+
+  /* Implements the read_memory_func disassemble_info callback.  */
+  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
+				  unsigned int len,
+				  struct disassemble_info *info);
+};
+
 /* A dissassembler class that provides 'print_insn', a method for
    disassembling a single instruction to the output stream.  */
 
@@ -278,10 +328,4 @@ extern char *get_disassembler_options (struct gdbarch *gdbarch);
 
 extern void set_disassembler_options (const char *options);
 
-/* Setup DINFO with its output function and output stream setup so that
-   nothing is printed while disassembling.  */
-
-extern void init_disassemble_info_for_no_printing
-  (struct disassemble_info *dinfo);
-
 #endif
diff --git a/gdb/s12z-tdep.c b/gdb/s12z-tdep.c
index 5394c1bbf5e..4e33faaea9a 100644
--- a/gdb/s12z-tdep.c
+++ b/gdb/s12z-tdep.c
@@ -141,27 +141,6 @@ s12z_dwarf_reg_to_regnum (struct gdbarch *gdbarch, int num)
 
 /* Support functions for frame handling.  */
 
-
-/* Return a disassemble_info initialized for s12z disassembly, however,
-   the disassembler will not actually print anything.  */
-
-static struct disassemble_info
-s12z_disassemble_info (struct gdbarch *gdbarch)
-{
-  struct disassemble_info di;
-  init_disassemble_info_for_no_printing (&di);
-  di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
-  di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
-  di.endian = gdbarch_byte_order (gdbarch);
-  di.read_memory_func = [](bfd_vma memaddr, gdb_byte *myaddr,
-			   unsigned int len, struct disassemble_info *info)
-    {
-      return target_read_code (memaddr, myaddr, len);
-    };
-  return di;
-}
-
-
 /* A struct (based on mem_read_abstraction_base) to read memory
    through the disassemble_info API.  */
 struct mem_read_abstraction
@@ -332,15 +311,14 @@ s12z_frame_cache (struct frame_info *this_frame, void **prologue_cache)
   int frame_size = 0;
   int saved_frame_size = 0;
 
-  struct disassemble_info di = s12z_disassemble_info (gdbarch);
-
+  struct gdb_non_printing_memory_disassembler dis (gdbarch);
 
   struct mem_read_abstraction mra;
   mra.base.read = (int (*)(mem_read_abstraction_base*,
 			   int, size_t, bfd_byte*)) abstract_read_memory;
   mra.base.advance = advance ;
   mra.base.posn = posn;
-  mra.info = &di;
+  mra.info = dis.disasm_info ();
 
   while (this_pc > addr)
     {
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv3 6/6] gdb: unify two dis_asm_read_memory functions in disasm.c
  2022-04-04 22:19   ` [PATCHv3 0/6] Add Python API for the disassembler Andrew Burgess
                       ` (4 preceding siblings ...)
  2022-04-04 22:19     ` [PATCHv3 5/6] gdb: refactor the non-printing disassemblers Andrew Burgess
@ 2022-04-04 22:19     ` Andrew Burgess
  2022-04-25  9:15     ` [PATCHv4 0/5] Add Python API for the disassembler Andrew Burgess
  6 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-04-04 22:19 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

After the recent restructuring of the disassembler code, GDB has ended
up with two identical class static functions, both called
dis_asm_read_memory, with identical implementations.

My first thought was to move these out of their respective classes,
and just make them global functions, then I'd only need a single
copy.

And maybe that's the right way to go.  But I disliked that by doing
that I loose the encapsulation of the method with the corresponding
disassembler class.

So, instead, I placed the static method into its own class, and had
both the gdb_non_printing_memory_disassembler and gdb_disassembler
classes inherit from this new class as an additional base-class.

In terms of code generated, I don't think there's any significant
difference with this approach, but I think this better reflects how
the function is closely tied to the disassembler.

There should be no user visible changes after this commit.
---
 gdb/disasm.c | 16 +++-------------
 gdb/disasm.h | 29 +++++++++++++++++------------
 2 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/gdb/disasm.c b/gdb/disasm.c
index 1dfa141b10b..563cef8b845 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -132,9 +132,9 @@ line_has_code_p (htab_t table, struct symtab *symtab, int line)
 /* Wrapper of target_read_code.  */
 
 int
-gdb_disassembler::dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
-				       unsigned int len,
-				       struct disassemble_info *info)
+gdb_disassembler_memory_reader::dis_asm_read_memory
+  (bfd_vma memaddr, gdb_byte *myaddr, unsigned int len,
+   struct disassemble_info *info)
 {
   return target_read_code (memaddr, myaddr, len);
 }
@@ -1014,16 +1014,6 @@ gdb_non_printing_disassembler::null_fprintf_styled_func
   return 0;
 }
 
-/* See disasm.h.  */
-
-int
-gdb_non_printing_memory_disassembler::dis_asm_read_memory
-  (bfd_vma memaddr, bfd_byte *myaddr, unsigned int length,
-   struct disassemble_info *dinfo)
-{
-  return target_read_code (memaddr, myaddr, length);
-}
-
 /* A non-printing disassemble_info management class.  The disassemble_info
    setup by this class will not print anything to the output stream (there
    is no output stream), and the instruction to be disassembled will be
diff --git a/gdb/disasm.h b/gdb/disasm.h
index 6c1d7673b01..5d1112cf0d6 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -165,31 +165,39 @@ struct gdb_non_printing_disassembler : public gdb_disassemble_info
     ATTRIBUTE_PRINTF(3,4);
 };
 
+/* This is a helper class, for use as an additional base-class, by some of
+   the disassembler classes below.  This class just defines a static method
+   for reading from target memory, which can then be used by the various
+   disassembler sub-classes.  */
+
+struct gdb_disassembler_memory_reader
+{
+  /* Implements the read_memory_func disassemble_info callback.  */
+  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
+				  unsigned int len,
+				  struct disassemble_info *info);
+};
+
 /* A non-printing disassemble_info management class.  The disassemble_info
    setup by this class will not print anything to the output stream (there
    is no output stream), and the instruction to be disassembled will be
    read from target memory.  */
 
 struct gdb_non_printing_memory_disassembler
-  : public gdb_non_printing_disassembler
+  : public gdb_non_printing_disassembler,
+    private gdb_disassembler_memory_reader
 {
   /* Constructor.  GDBARCH is the architecture to disassemble for.  */
   gdb_non_printing_memory_disassembler (struct gdbarch *gdbarch)
     :gdb_non_printing_disassembler (gdbarch, dis_asm_read_memory)
   { /* Nothing.  */ }
-
-private:
-
-  /* Implements the read_memory_func disassemble_info callback.  */
-  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
-				  unsigned int len,
-				  struct disassemble_info *info);
 };
 
 /* A dissassembler class that provides 'print_insn', a method for
    disassembling a single instruction to the output stream.  */
 
-struct gdb_disassembler : public gdb_printing_disassembler
+struct gdb_disassembler : public gdb_printing_disassembler,
+			  private gdb_disassembler_memory_reader
 {
   gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
     : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
@@ -239,9 +247,6 @@ struct gdb_disassembler : public gdb_printing_disassembler
      (currently just to addresses and symbols) as it goes.  */
   static bool use_ext_lang_colorization_p;
 
-  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
-				  unsigned int len,
-				  struct disassemble_info *info);
   static void dis_asm_memory_error (int err, bfd_vma memaddr,
 				    struct disassemble_info *info);
   static void dis_asm_print_address (bfd_vma addr,
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv3 4/6] gdb/python: implement the print_insn extension language hook
  2022-04-04 22:19     ` [PATCHv3 4/6] gdb/python: implement the print_insn extension language hook Andrew Burgess
@ 2022-04-05 12:04       ` Eli Zaretskii
  0 siblings, 0 replies; 80+ messages in thread
From: Eli Zaretskii @ 2022-04-05 12:04 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

> Date: Mon,  4 Apr 2022 23:19:57 +0100
> From: Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org>
> Cc: Andrew Burgess <andrew.burgess@embecosm.com>
> 
> --- a/gdb/NEWS
> +++ b/gdb/NEWS
> @@ -18,6 +18,40 @@
>       This is the same format that GDB uses when printing address, symbol,
>       and offset information from the disassembler.
>  
> +  ** New Python API for wrapping GDB's disassembler:
> +
> +     - gdb.disassembler.register_disassembler(DISASSEMBLER, ARCH).
> +       DISASSEMBLER is a sub-class of gdb.disassembler.Disassembler.
> +       ARCH is either None or a string containing a bfd architecture
> +       name.  DISASSEMBLER is registered as a disassembler for
> +       architecture ARCH, or for all architectures if ARCH is None.
> +       The previous disassembler registered for ARCH is returned, this
> +       can be None if no previous disassembler was registered.
> +
> +     - gdb.disassembler.Disassembler is the class from which all
> +       disassemblers should inherit.  Its constructor takes a string,
> +       a name for the disassembler, which is currently only used is
> +       some debug output.                                        ^^

A typo. 

> +Or, this function can return an object that represents the
   ^^^^^^^^^^^^^^^^^
It is better to say "Alternatively, this function ..."

> +The required properties, which this class provides are:
                                                     ^
A comma is missing there.

> +@cindex disassembler in Python, global vs.@: specific
> +@cindex search order for disassembler in Python
> +@cindex look up of disassembler in Python
> +
> +@value{GDBN} only records a single disassembler for each architecture,

There shouldn't be an empty line between the @cindex entries and the
following text, so that the position recorded in the index is the
beginning of the text, not a newline away.

The documentation parts are approved, once those nits are fixed.

Thanks.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv3 1/6] gdb: move gdb_disassembly_flag into a new disasm-flags.h file
  2022-04-04 22:19     ` [PATCHv3 1/6] gdb: move gdb_disassembly_flag into a new disasm-flags.h file Andrew Burgess
@ 2022-04-05 14:32       ` Tom Tromey
  2022-04-06 12:18         ` Andrew Burgess
  0 siblings, 1 reply; 80+ messages in thread
From: Tom Tromey @ 2022-04-05 14:32 UTC (permalink / raw)
  To: Andrew Burgess via Gdb-patches; +Cc: Andrew Burgess

>>>>> "Andrew" == Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> writes:

Andrew> In this commit I move gdb_disassembly_flag into its own file.  This is
Andrew> then included in target.h and disasm.h, after which, the number of
Andrew> files that need to include disasm.h is much reduced.

Andrew> Now, after changing disasm.h, GDB rebuilds much quicker.

Looks good to me.

Tom

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv3 1/6] gdb: move gdb_disassembly_flag into a new disasm-flags.h file
  2022-04-05 14:32       ` Tom Tromey
@ 2022-04-06 12:18         ` Andrew Burgess
  0 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-04-06 12:18 UTC (permalink / raw)
  To: Tom Tromey, Andrew Burgess via Gdb-patches

Tom Tromey <tom@tromey.com> writes:

>>>>>> "Andrew" == Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org> writes:
>
> Andrew> In this commit I move gdb_disassembly_flag into its own file.  This is
> Andrew> then included in target.h and disasm.h, after which, the number of
> Andrew> files that need to include disasm.h is much reduced.
>
> Andrew> Now, after changing disasm.h, GDB rebuilds much quicker.
>
> Looks good to me.

Thanks.  I've gone ahead and pushed this patch on its own for now.

Andrew


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv4 0/5] Add Python API for the disassembler
  2022-04-04 22:19   ` [PATCHv3 0/6] Add Python API for the disassembler Andrew Burgess
                       ` (5 preceding siblings ...)
  2022-04-04 22:19     ` [PATCHv3 6/6] gdb: unify two dis_asm_read_memory functions in disasm.c Andrew Burgess
@ 2022-04-25  9:15     ` Andrew Burgess
  2022-04-25  9:15       ` [PATCHv4 1/5] gdb: add new base class to gdb_disassembler Andrew Burgess
                         ` (6 more replies)
  6 siblings, 7 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-04-25  9:15 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

Changes in v4:

  - Patch #1 from v3 series has been merged,

  - Addressed Eli's feedback on previous series,

  - Rebased onto current upstream/master.

Changes in v3:

  - Rebased to current master, and retested,

  - Patch #1 is new in this series,

  - Patch #2 is changed slightly from v2, I've reworked the
    disassembler classes in a slightly different way now, in order to
    prepare for patches #5 and #6.

  - Patch #3 is unchanged from v2,

  - Patch #4 is unchanged from v2,

  - Patch #5 is new in v3.  I've included it here as the changes in #2
    only make sense knowing that patch #5 is coming,

  - Patch #6 is a small cleanup only possible after #2 and #5 have landed.

Changes in v2:

  - The first 3 patches from the v1 series were merged a while back,
    these were all refactoring, or auxiliary features,

  - There's a new #1 patch in the v2 series that does some new
    refactoring of GDB's disassembler classes, this was required in
    order to simplify the #3 patch,

  - Patch #2 in the v2 series is largely unchanged from patch #4 in
    the v1 series,

  - The syntax highlighting work that was in the v1 series was spun
    out into its own patch, and has been merged separately,

  - The format_address helper function that appeared in the v1 series,
    and that Simon suggested I make more general, was spun out into
    its own patch, and merged separately,

  - Finally, patch #3 in the v2 series is pretty much a complete
    rewrite from the v1 series in order to follow the approach
    suggested by Simon.  Results are now returned directly, either via
    'return' or by raising an exception, in contrast to the original
    approach which involved "setting" the result into an existing
    state object.

---

Andrew Burgess (5):
  gdb: add new base class to gdb_disassembler
  gdb: add extension language print_insn hook
  gdb/python: implement the print_insn extension language hook
  gdb: refactor the non-printing disassemblers
  gdb: unify two dis_asm_read_memory functions in disasm.c

 gdb/Makefile.in                        |   1 +
 gdb/NEWS                               |  34 +
 gdb/arc-linux-tdep.c                   |  15 +-
 gdb/arc-tdep.c                         |  29 +-
 gdb/arc-tdep.h                         |   5 -
 gdb/arm-tdep.c                         |   4 +-
 gdb/data-directory/Makefile.in         |   1 +
 gdb/disasm-selftests.c                 |  70 +-
 gdb/disasm.c                           | 172 ++---
 gdb/disasm.h                           | 207 +++++-
 gdb/doc/python.texi                    | 247 +++++++
 gdb/extension-priv.h                   |  15 +
 gdb/extension.c                        |  20 +
 gdb/extension.h                        |  17 +
 gdb/guile/guile.c                      |   6 +-
 gdb/mips-tdep.c                        |   4 +-
 gdb/python/lib/gdb/disassembler.py     | 109 +++
 gdb/python/py-disasm.c                 | 970 +++++++++++++++++++++++++
 gdb/python/python-internal.h           |  16 +
 gdb/python/python.c                    |   3 +
 gdb/s12z-tdep.c                        |  26 +-
 gdb/testsuite/gdb.python/py-disasm.c   |  25 +
 gdb/testsuite/gdb.python/py-disasm.exp | 150 ++++
 gdb/testsuite/gdb.python/py-disasm.py  | 456 ++++++++++++
 24 files changed, 2405 insertions(+), 197 deletions(-)
 create mode 100644 gdb/python/lib/gdb/disassembler.py
 create mode 100644 gdb/python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv4 1/5] gdb: add new base class to gdb_disassembler
  2022-04-25  9:15     ` [PATCHv4 0/5] Add Python API for the disassembler Andrew Burgess
@ 2022-04-25  9:15       ` Andrew Burgess
  2022-05-03 13:34         ` Simon Marchi
  2022-04-25  9:15       ` [PATCHv4 2/5] gdb: add extension language print_insn hook Andrew Burgess
                         ` (5 subsequent siblings)
  6 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2022-04-25  9:15 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

The motivation for this change is an upcoming Python disassembler API
that I would like to add.  As part of that change I need to create a
new disassembler like class that contains a disassemble_info and a
gdbarch.  The management of these two objects is identical to how we
manage these objects within gdb_disassembler, so it might be tempting
for my new class to inherit from gdb_disassembler.

The problem however, is that gdb_disassembler has a tight connection
between its constructor, and its print_insn method.  In the
constructor the ui_file* that is passed in is replaced with a member
variable string_file*, and then in print_insn, the contents of the
member variable string_file are printed to the original ui_file*.

What this means is that the gdb_disassembler class has a tight
coupling between its constructor and print_insn; the class just isn't
intended to be used in a situation where print_insn is not going to be
called, which is how my (upcoming) sub-class would need to operate.

My solution then, is to separate out the management of the
disassemble_info and gdbarch into a new gdb_disassemble_info class,
and make this class a parent of gdb_disassembler.

In arm-tdep.c and mips-tdep.c, where we used to cast the
disassemble_info->application_data to a gdb_disassembler, we can now
cast to a gdb_disassemble_info as we only need to access the gdbarch
information.

Now, my new Python disassembler sub-class will still want to print
things to an output stream, and so we will want access to the
dis_asm_fprintf functionality for printing.

However, rather than move this printing code into the
gdb_disassemble_info base class, I have added yet another level of
hierarchy, a gdb_printing_disassembler, thus the class structure is
now:

  struct gdb_disassemble_info {};
  struct gdb_printing_disassembler : public gdb_disassemble_info {};
  struct gdb_disassembler : public gdb_printing_disassembler {};

In a later commit my new Python disassembler will inherit from
gdb_printing_disassembler.

The reason for adding the additional layer to the class hierarchy is
that in yet another commit I intend to rewrite the function
gdb_buffered_insn_length, and to do this I will be creating yet more
disassembler like classes, however, these will not print anything,
thus I will add a gdb_non_printing_disassembler class that also
inherits from gdb_disassemble_info.  Knowing that that change is
coming, I've gone with the above class hierarchy now.

There should be no user visible changes after this commit.
---
 gdb/arm-tdep.c  |   4 +-
 gdb/disasm.c    |  51 +++++++++++-------
 gdb/disasm.h    | 140 ++++++++++++++++++++++++++++++++++++++----------
 gdb/mips-tdep.c |   4 +-
 4 files changed, 147 insertions(+), 52 deletions(-)

diff --git a/gdb/arm-tdep.c b/gdb/arm-tdep.c
index cc7773914d7..a4ffc566a45 100644
--- a/gdb/arm-tdep.c
+++ b/gdb/arm-tdep.c
@@ -7874,8 +7874,8 @@ arm_displaced_step_fixup (struct gdbarch *gdbarch,
 static int
 gdb_print_insn_arm (bfd_vma memaddr, disassemble_info *info)
 {
-  gdb_disassembler *di
-    = static_cast<gdb_disassembler *>(info->application_data);
+  gdb_disassemble_info *di
+    = static_cast<gdb_disassemble_info *> (info->application_data);
   struct gdbarch *gdbarch = di->arch ();
 
   if (arm_pc_is_thumb (gdbarch, memaddr))
diff --git a/gdb/disasm.c b/gdb/disasm.c
index f2df5ef7bc5..3f55e12665b 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -166,7 +166,8 @@ gdb_disassembler::dis_asm_print_address (bfd_vma addr,
 /* Format disassembler output to STREAM.  */
 
 int
-gdb_disassembler::dis_asm_fprintf (void *stream, const char *format, ...)
+gdb_printing_disassembler::fprintf_func (void *stream,
+					 const char *format, ...)
 {
   va_list args;
 
@@ -180,9 +181,9 @@ gdb_disassembler::dis_asm_fprintf (void *stream, const char *format, ...)
 /* See disasm.h.  */
 
 int
-gdb_disassembler::dis_asm_styled_fprintf (void *stream,
-					  enum disassembler_style style,
-					  const char *format, ...)
+gdb_printing_disassembler::fprintf_styled_func (void *stream,
+						enum disassembler_style style,
+						const char *format, ...)
 {
   va_list args;
 
@@ -797,26 +798,34 @@ get_all_disassembler_options (struct gdbarch *gdbarch)
 
 gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
 				    struct ui_file *file,
-				    di_read_memory_ftype read_memory_func)
-  : m_gdbarch (gdbarch),
+				    read_memory_ftype func)
+  : gdb_printing_disassembler (gdbarch, &m_buffer, func,
+			       dis_asm_memory_error, dis_asm_print_address),
     m_buffer (!use_ext_lang_colorization_p && disassembler_styling
 	      && file->can_emit_style_escape ()),
     m_dest (file)
+{ /* Nothing.  */ }
+
+/* See disasm.h.  */
+
+gdb_disassemble_info::gdb_disassemble_info
+  (struct gdbarch *gdbarch, struct ui_file *stream,
+   read_memory_ftype read_memory_func, memory_error_ftype memory_error_func,
+   print_address_ftype print_address_func, fprintf_ftype fprintf_func,
+   fprintf_styled_ftype fprintf_styled_func)
+    : m_gdbarch (gdbarch)
 {
-  init_disassemble_info (&m_di, &m_buffer, dis_asm_fprintf,
-			 dis_asm_styled_fprintf);
+  gdb_assert (fprintf_func != nullptr);
+  gdb_assert (fprintf_styled_func != nullptr);
+  init_disassemble_info (&m_di, stream, fprintf_func,
+			 fprintf_styled_func);
   m_di.flavour = bfd_target_unknown_flavour;
-  m_di.memory_error_func = dis_asm_memory_error;
-  m_di.print_address_func = dis_asm_print_address;
-  /* NOTE: cagney/2003-04-28: The original code, from the old Insight
-     disassembler had a local optimization here.  By default it would
-     access the executable file, instead of the target memory (there
-     was a growing list of exceptions though).  Unfortunately, the
-     heuristic was flawed.  Commands like "disassemble &variable"
-     didn't work as they relied on the access going to the target.
-     Further, it has been superseeded by trust-read-only-sections
-     (although that should be superseeded by target_trust..._p()).  */
-  m_di.read_memory_func = read_memory_func;
+  if (memory_error_func != nullptr)
+    m_di.memory_error_func = memory_error_func;
+  if (print_address_func != nullptr)
+    m_di.print_address_func = print_address_func;
+  if (read_memory_func != nullptr)
+    m_di.read_memory_func = read_memory_func;
   m_di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
   m_di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
   m_di.endian = gdbarch_byte_order (gdbarch);
@@ -828,7 +837,9 @@ gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
   disassemble_init_for_target (&m_di);
 }
 
-gdb_disassembler::~gdb_disassembler ()
+/* See disasm.h.  */
+
+gdb_disassemble_info::~gdb_disassemble_info ()
 {
   disassemble_free_target (&m_di);
 }
diff --git a/gdb/disasm.h b/gdb/disasm.h
index 7efab7db46c..b3e40e2981e 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -26,43 +26,137 @@ struct gdbarch;
 struct ui_out;
 struct ui_file;
 
-class gdb_disassembler
-{
-  using di_read_memory_ftype = decltype (disassemble_info::read_memory_func);
-
-public:
-  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
-    : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
-  {}
+/* A wrapper around a disassemble_info and a gdbarch.  This is the core
+   set of data that all disassembler sub-classes will need.  This class
+   doesn't actually implement the disassembling process, that is something
+   that sub-classes will do, with each sub-class doing things slightly
+   differently.
 
-  ~gdb_disassembler ();
+   The constructor of this class is protected, you should not create
+   instances of this class directly, instead create an instance of an
+   appropriate sub-class.  */
 
-  DISABLE_COPY_AND_ASSIGN (gdb_disassembler);
-
-  int print_insn (CORE_ADDR memaddr, int *branch_delay_insns = NULL);
+struct gdb_disassemble_info
+{
+  DISABLE_COPY_AND_ASSIGN (gdb_disassemble_info);
 
-  /* Return the gdbarch of gdb_disassembler.  */
+  /* Return the gdbarch we are disassembing for.  */
   struct gdbarch *arch ()
   { return m_gdbarch; }
 
+  /* Return a pointer to the disassemble_info, this will be needed for
+     passing into the libopcodes disassembler.  */
+  struct disassemble_info *disasm_info ()
+  { return &m_di; }
+
 protected:
-  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file,
-		    di_read_memory_ftype func);
 
+  /* Types for the function callbacks within m_di.  */
+  using read_memory_ftype = decltype (disassemble_info::read_memory_func);
+  using memory_error_ftype = decltype (disassemble_info::memory_error_func);
+  using print_address_ftype = decltype (disassemble_info::print_address_func);
+  using fprintf_ftype = decltype (disassemble_info::fprintf_func);
+  using fprintf_styled_ftype = decltype (disassemble_info::fprintf_styled_func);
+
+  /* Constructor, many fields in m_di are initialized from GDBARCH.  STREAM
+     is where the output of the disassembler will be written too, the
+     remaining arguments are function callbacks that are written into
+     m_di.  Of these function callbacks FPRINTF_FUNC and
+     FPRINTF_STYLED_FUNC must not be nullptr.  If READ_MEMORY_FUNC,
+     MEMORY_ERROR_FUNC, or PRINT_ADDRESS_FUNC are nullptr, then that field
+     within m_di is left with its default value (see the libopcodes
+     function init_disassemble_info for the defaults).  */
+  gdb_disassemble_info (struct gdbarch *gdbarch,
+			struct ui_file *stream,
+			read_memory_ftype read_memory_func,
+			memory_error_ftype memory_error_func,
+			print_address_ftype print_address_func,
+			fprintf_ftype fprintf_func,
+			fprintf_styled_ftype fprintf_styled_func);
+
+  /* Destructor.  */
+  virtual ~gdb_disassemble_info ();
+
+  /* The stream that disassembler output is being written too.  */
   struct ui_file *stream ()
   { return (struct ui_file *) m_di.stream; }
 
-private:
-  struct gdbarch *m_gdbarch;
-
   /* Stores data required for disassembling instructions in
      opcodes.  */
   struct disassemble_info m_di;
 
+private:
+  /* The architecture we are disassembling for.  */
+  struct gdbarch *m_gdbarch;
+
   /* If we own the string in `m_di.disassembler_options', we do so
      using this field.  */
   std::string m_disassembler_options_holder;
+};
+
+/* A wrapper around gdb_disassemble_info.  This class adds default
+   print functions that are supplied to the disassemble_info within the
+   parent class.  These default print functions write to the stream, which
+   is also contained in the parent class.
+
+   As with the parent class, the constructor for this class is protected,
+   you should not create instances of this class, but create an
+   appropriate sub-class instead.  */
 
+struct gdb_printing_disassembler : public gdb_disassemble_info
+{
+  DISABLE_COPY_AND_ASSIGN (gdb_printing_disassembler);
+
+protected:
+
+  /* Constructor.  All the arguments are just passed to the parent class.
+     We also add the two print functions to the arguments passed to the
+     parent.  See gdb_disassemble_info for a description of how the
+     arguments are handled.  */
+  gdb_printing_disassembler (struct gdbarch *gdbarch,
+			     struct ui_file *stream,
+			     read_memory_ftype read_memory_func,
+			     memory_error_ftype memory_error_func,
+			     print_address_ftype print_address_func)
+    : gdb_disassemble_info (gdbarch, stream, read_memory_func,
+			    memory_error_func, print_address_func,
+			    fprintf_func, fprintf_styled_func)
+  { /* Nothing.  */ }
+
+  /* Callback used as the disassemble_info's fprintf_func callback, this
+     writes to STREAM, which will be m_di.stream.  */
+  static int fprintf_func (void *stream, const char *format, ...)
+    ATTRIBUTE_PRINTF(2,3);
+
+  /* Callback used as the disassemble_info's fprintf_styled_func callback,
+     this writes to STREAM, which will be m_di.stream.  */
+  static int fprintf_styled_func (void *stream,
+				  enum disassembler_style style,
+				  const char *format, ...)
+    ATTRIBUTE_PRINTF(3,4);
+};
+
+/* A dissassembler class that provides 'print_insn', a method for
+   disassembling a single instruction to the output stream.  */
+
+struct gdb_disassembler : public gdb_printing_disassembler
+{
+  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
+    : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
+  { /* Nothing.  */ }
+
+  DISABLE_COPY_AND_ASSIGN (gdb_disassembler);
+
+  /* Disassemble a single instruction at MEMADDR to the ui_file* that was
+     passed to the constructor.  If a memory error occurs while
+     disassembling this instruction then an error will be thrown.  */
+  int print_insn (CORE_ADDR memaddr, int *branch_delay_insns = NULL);
+
+protected:
+  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file,
+		    read_memory_ftype func);
+
+private:
   /* This member variable is given a value by calling dis_asm_memory_error.
      If after calling into the libopcodes disassembler we get back a
      negative value (which indicates an error), then, if this variable has
@@ -95,16 +189,6 @@ class gdb_disassembler
      (currently just to addresses and symbols) as it goes.  */
   static bool use_ext_lang_colorization_p;
 
-  static int dis_asm_fprintf (void *stream, const char *format, ...)
-    ATTRIBUTE_PRINTF(2,3);
-
-  /* Print formatted message to STREAM, the content can be styled based on
-     STYLE if desired.  */
-  static int dis_asm_styled_fprintf (void *stream,
-				     enum disassembler_style style,
-				     const char *format, ...)
-    ATTRIBUTE_PRINTF(3,4);
-
   static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
 				  unsigned int len,
 				  struct disassemble_info *info);
diff --git a/gdb/mips-tdep.c b/gdb/mips-tdep.c
index 354c2b54e07..0956bd54f71 100644
--- a/gdb/mips-tdep.c
+++ b/gdb/mips-tdep.c
@@ -7018,8 +7018,8 @@ reinit_frame_cache_sfunc (const char *args, int from_tty,
 static int
 gdb_print_insn_mips (bfd_vma memaddr, struct disassemble_info *info)
 {
-  gdb_disassembler *di
-    = static_cast<gdb_disassembler *>(info->application_data);
+  gdb_disassemble_info *di
+    = static_cast<gdb_disassemble_info *> (info->application_data);
   struct gdbarch *gdbarch = di->arch ();
 
   /* FIXME: cagney/2003-06-26: Is this even necessary?  The
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv4 2/5] gdb: add extension language print_insn hook
  2022-04-25  9:15     ` [PATCHv4 0/5] Add Python API for the disassembler Andrew Burgess
  2022-04-25  9:15       ` [PATCHv4 1/5] gdb: add new base class to gdb_disassembler Andrew Burgess
@ 2022-04-25  9:15       ` Andrew Burgess
  2022-05-03 13:42         ` Simon Marchi
  2022-04-25  9:15       ` [PATCHv4 3/5] gdb/python: implement the print_insn extension language hook Andrew Burgess
                         ` (4 subsequent siblings)
  6 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2022-04-25  9:15 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

This commit is setup for the next commit.

In the next commit I will add a Python API to intercept the print_insn
calls within GDB, each print_insn call is responsible for
disassembling, and printing one instruction.  After the next commit it
will be possible for a user to write Python code that either wraps
around the existing disassembler, or even, in extreme situations,
entirely replaces the existing disassembler.

This commit does not add any new Python API.

What this commit does is put the extension language framework in place
for a print_insn hook.  There's a new callback added to 'struct
extension_language_ops', which is then filled in with nullptr for Python
and Guile.

Finally, in the disassembler, the code is restructured so that the new
extension language function ext_lang_print_insn is called before we
delegate to gdbarch_print_insn.

After this, the next commit can focus entirely on providing a Python
implementation of the new print_insn callback.

There should be no user visible change after this commit.
---
 gdb/disasm.c         | 29 ++++++++++++++++++++++++++---
 gdb/extension-priv.h | 15 +++++++++++++++
 gdb/extension.c      | 20 ++++++++++++++++++++
 gdb/extension.h      | 17 +++++++++++++++++
 gdb/guile/guile.c    |  6 +++++-
 gdb/python/python.c  |  2 ++
 6 files changed, 85 insertions(+), 4 deletions(-)

diff --git a/gdb/disasm.c b/gdb/disasm.c
index 3f55e12665b..16e3c39b702 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -844,6 +844,29 @@ gdb_disassemble_info::~gdb_disassemble_info ()
   disassemble_free_target (&m_di);
 }
 
+/* Wrapper around calling gdbarch_print_insn.  This function takes care of
+   first calling the extension language hooks for print_insn, and, if none
+   of the extension languages can print this instruction, calls
+   gdbarch_print_insn to do the work.
+
+   GDBARCH is the architecture to disassemble in, VMA is the address of the
+   instruction being disassembled, and INFO is the libopcodes disassembler
+   related information.  */
+
+static int
+gdb_print_insn_1 (struct gdbarch *gdbarch, CORE_ADDR vma,
+		  struct disassemble_info *info)
+{
+  /* Call into the extension languages to do the disassembly.  */
+  gdb::optional<int> length = ext_lang_print_insn (gdbarch, vma, info);
+  if (length.has_value ())
+    return *length;
+
+  /* No extension language wanted to do the disassembly, so do it
+     manually.  */
+  return gdbarch_print_insn (gdbarch, vma, info);
+}
+
 /* See disasm.h.  */
 
 bool gdb_disassembler::use_ext_lang_colorization_p = true;
@@ -857,7 +880,7 @@ gdb_disassembler::print_insn (CORE_ADDR memaddr,
   m_err_memaddr.reset ();
   m_buffer.clear ();
 
-  int length = gdbarch_print_insn (arch (), memaddr, &m_di);
+  int length = gdb_print_insn_1 (arch (), memaddr, &m_di);
 
   /* If we have successfully disassembled an instruction, styling is on, we
      think that the extension language might be able to perform styling for
@@ -892,7 +915,7 @@ gdb_disassembler::print_insn (CORE_ADDR memaddr,
 	  gdb_assert (!m_buffer.term_out ());
 	  m_buffer.~string_file ();
 	  new (&m_buffer) string_file (true);
-	  length = gdbarch_print_insn (arch (), memaddr, &m_di);
+	  length = gdb_print_insn_1 (arch (), memaddr, &m_di);
 	  gdb_assert (length > 0);
 	}
     }
@@ -1047,7 +1070,7 @@ gdb_buffered_insn_length (struct gdbarch *gdbarch,
   gdb_buffered_insn_length_init_dis (gdbarch, &di, insn, max_len, addr,
 				     &disassembler_options_holder);
 
-  int result = gdbarch_print_insn (gdbarch, addr, &di);
+  int result = gdb_print_insn_1 (gdbarch, addr, &di);
   disassemble_free_target (&di);
   return result;
 }
diff --git a/gdb/extension-priv.h b/gdb/extension-priv.h
index d9450b51231..7c74e721c57 100644
--- a/gdb/extension-priv.h
+++ b/gdb/extension-priv.h
@@ -263,6 +263,21 @@ struct extension_language_ops
      contents, or an empty optional.  */
   gdb::optional<std::string> (*colorize_disasm) (const std::string &content,
 						 gdbarch *gdbarch);
+
+  /* Print a single instruction from ADDRESS in architecture GDBARCH.  INFO
+     is the standard libopcodes disassembler_info structure.  Bytes for the
+     instruction being printed should be read using INFO->read_memory_func
+     as the actual instruction bytes might be in a buffer.
+
+     Use INFO->fprintf_func to print the results of the disassembly, and
+     return the length of the instruction.
+
+     If no instruction can be disassembled then return an empty value and
+     other extension languages will get a chance to perform the
+     disassembly.  */
+  gdb::optional<int> (*print_insn) (struct gdbarch *gdbarch,
+				    CORE_ADDR address,
+				    struct disassemble_info *info);
 };
 
 /* State necessary to restore a signal handler to its previous value.  */
diff --git a/gdb/extension.c b/gdb/extension.c
index 8f39b86e952..62f41c6445d 100644
--- a/gdb/extension.c
+++ b/gdb/extension.c
@@ -924,6 +924,26 @@ ext_lang_colorize_disasm (const std::string &content, gdbarch *gdbarch)
   return result;
 }
 
+/* See extension.h.  */
+
+gdb::optional<int>
+ext_lang_print_insn (struct gdbarch *gdbarch, CORE_ADDR address,
+		     struct disassemble_info *info)
+{
+  for (const struct extension_language_defn *extlang : extension_languages)
+    {
+      if (extlang->ops == nullptr
+	  || extlang->ops->print_insn == nullptr)
+	continue;
+      gdb::optional<int> length
+	(extlang->ops->print_insn (gdbarch, address, info));
+      if (length.has_value ())
+	return length;
+    }
+
+  return {};
+}
+
 /* Called via an observer before gdb prints its prompt.
    Iterate over the extension languages giving them a chance to
    change the prompt.  The first one to change the prompt wins,
diff --git a/gdb/extension.h b/gdb/extension.h
index 7eb89530c44..f7518f91b35 100644
--- a/gdb/extension.h
+++ b/gdb/extension.h
@@ -327,6 +327,23 @@ extern gdb::optional<std::string> ext_lang_colorize
 extern gdb::optional<std::string> ext_lang_colorize_disasm
   (const std::string &content, gdbarch *gdbarch);
 
+/* Try to disassemble a single instruction.  ADDRESS is the address that
+   the instructions apparent address, though bytes for the instruction
+   should be read by calling INFO->read_memory_func as we might be
+   disassembling out of a buffer.  GDBARCH is the architecture in which we
+   are performing the disassembly.
+
+   The disassembled instruction should be printed by calling
+   INFO->fprintf_func, and the length (in octets) of the disassembled
+   instruction should be returned.
+
+   If no instruction could be disassembled then an empty value is returned
+   and GDB will call gdbarch_print_insn to perform the disassembly
+   itself.  */
+
+extern gdb::optional<int> ext_lang_print_insn
+  (struct gdbarch *gdbarch, CORE_ADDR address, struct disassemble_info *info);
+
 #if GDB_SELF_TEST
 namespace selftests {
 extern void (*hook_set_active_ext_lang) ();
diff --git a/gdb/guile/guile.c b/gdb/guile/guile.c
index c7be48fb739..14b191ded62 100644
--- a/gdb/guile/guile.c
+++ b/gdb/guile/guile.c
@@ -130,8 +130,12 @@ static const struct extension_language_ops guile_extension_ops =
   gdbscm_breakpoint_has_cond,
   gdbscm_breakpoint_cond_says_stop,
 
-  NULL, /* gdbscm_check_quit_flag, */
   NULL, /* gdbscm_set_quit_flag, */
+  NULL, /* gdbscm_check_quit_flag, */
+  NULL, /* gdbscm_before_prompt, */
+  NULL, /* gdbscm_get_matching_xmethod_workers */
+  NULL, /* gdbscm_colorize */
+  NULL, /* gdbscm_print_insn */
 };
 #endif
 
diff --git a/gdb/python/python.c b/gdb/python/python.c
index 11aaa7ae778..b5b8379e23c 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -166,6 +166,8 @@ static const struct extension_language_ops python_extension_ops =
   gdbpy_colorize,
 
   gdbpy_colorize_disasm,
+
+  NULL, /* gdbpy_print_insn, */
 };
 
 #endif /* HAVE_PYTHON */
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv4 3/5] gdb/python: implement the print_insn extension language hook
  2022-04-25  9:15     ` [PATCHv4 0/5] Add Python API for the disassembler Andrew Burgess
  2022-04-25  9:15       ` [PATCHv4 1/5] gdb: add new base class to gdb_disassembler Andrew Burgess
  2022-04-25  9:15       ` [PATCHv4 2/5] gdb: add extension language print_insn hook Andrew Burgess
@ 2022-04-25  9:15       ` Andrew Burgess
  2022-04-25 11:26         ` Eli Zaretskii
  2022-05-03 14:55         ` Simon Marchi
  2022-04-25  9:15       ` [PATCHv4 4/5] gdb: refactor the non-printing disassemblers Andrew Burgess
                         ` (3 subsequent siblings)
  6 siblings, 2 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-04-25  9:15 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

This commit extends the Python API to include disassembler support.

The motivation for this commit was to provide an API by which the user
could write Python scripts that would augment the output of the
disassembler.

To achieve this I have followed the model of the existing libopcodes
disassembler, that is, instructions are disassembled one by one.  This
does restrict the type of things that it is possible to do from a
Python script, i.e. all additional output has to fit on a single line,
but this was all I needed, and creating something more complex would,
I think, require greater changes to how GDB's internal disassembler
operates.

The disassembler API is contained in the new gdb.disassembler module,
which defines the following classes:

  DisassembleInfo

      Similar to libopcodes disassemble_info structure, has read-only
  properties: address, architecture, and progspace.  And has methods:
  read_memory, and is_valid.

      Each time GDB wants an instruction disassembled, an instance of
  this class is passed to a user written disassembler function, by
  reading the properties, and calling the methods (and other support
  methods in the gdb.disassembler module) the user can perform and
  return the disassembly.

  Disassembler

      This is a base-class which user written disassemblers should
  inherit from.  This base class just provides base implementations of
  __init__ and __call__ which the user written disassembler should
  override.

  DisassemblerResult

      This class can be used to hold the result of a call to the
  disassembler, it's really just a wrapper around a string (the text
  of the disassembled instruction) and a length (in bytes).  The user
  can return an instance of this class from Disassembler.__call__ to
  represent the newly disassembled instruction.

The gdb.disassembler module also provides the following functions:

  register_disassembler

      This function registers an instance of a Disassembler sub-class
  as a disassembler, either for one specific architecture, or, as a
  global disassembler for all architectures.

  builtin_disassemble

      This provides access to GDB's builtin disassembler.  A common
  use case that I see is augmenting the existing disassembler output.
  The user code can call this function to have GDB disassemble the
  instruction in the normal way.  The user gets back a
  DisassemblerResult object, which they can then read in order to
  augment the disassembler output in any way they wish.

      This function also provides a mechanism to intercept the
  disassemblers reads of memory, thus the user can adjust what GDB
  sees when it is disassembling.

The included documentation provides a more detailed description of the
API.
---
 gdb/Makefile.in                        |   1 +
 gdb/NEWS                               |  34 +
 gdb/data-directory/Makefile.in         |   1 +
 gdb/doc/python.texi                    | 247 +++++++
 gdb/python/lib/gdb/disassembler.py     | 109 +++
 gdb/python/py-disasm.c                 | 970 +++++++++++++++++++++++++
 gdb/python/python-internal.h           |  16 +
 gdb/python/python.c                    |   3 +-
 gdb/testsuite/gdb.python/py-disasm.c   |  25 +
 gdb/testsuite/gdb.python/py-disasm.exp | 150 ++++
 gdb/testsuite/gdb.python/py-disasm.py  | 456 ++++++++++++
 11 files changed, 2011 insertions(+), 1 deletion(-)
 create mode 100644 gdb/python/lib/gdb/disassembler.py
 create mode 100644 gdb/python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

diff --git a/gdb/Makefile.in b/gdb/Makefile.in
index ec0e55dd803..dc5ae4ce440 100644
--- a/gdb/Makefile.in
+++ b/gdb/Makefile.in
@@ -393,6 +393,7 @@ SUBDIR_PYTHON_SRCS = \
 	python/py-cmd.c \
 	python/py-connection.c \
 	python/py-continueevent.c \
+	python/py-disasm.c \
 	python/py-event.c \
 	python/py-evtregistry.c \
 	python/py-evts.c \
diff --git a/gdb/NEWS b/gdb/NEWS
index 760cb2b7abc..02ac6d70d31 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -38,6 +38,40 @@ maintenance info line-table
      This is the same format that GDB uses when printing address, symbol,
      and offset information from the disassembler.
 
+  ** New Python API for wrapping GDB's disassembler:
+
+     - gdb.disassembler.register_disassembler(DISASSEMBLER, ARCH).
+       DISASSEMBLER is a sub-class of gdb.disassembler.Disassembler.
+       ARCH is either None or a string containing a bfd architecture
+       name.  DISASSEMBLER is registered as a disassembler for
+       architecture ARCH, or for all architectures if ARCH is None.
+       The previous disassembler registered for ARCH is returned, this
+       can be None if no previous disassembler was registered.
+
+     - gdb.disassembler.Disassembler is the class from which all
+       disassemblers should inherit.  Its constructor takes a string,
+       a name for the disassembler, which is currently only used in
+       some debug output.  Sub-classes should override the __call__
+       method to perform disassembly, invoking __call__ on this base
+       class will raise an exception.
+
+     - gdb.disassembler.DisassembleInfo is the class used to describe
+       a single disassembly request from GDB.  An instance of this
+       class is passed to the __call__ method of
+       gdb.disassembler.Disassembler and has the following read-only
+       attributes: 'address', and 'architecture', as well as the
+       following method: 'read_memory'.
+
+     - gdb.disassembler.builtin_disassemble(INFO, MEMORY_SOURCE),
+       calls GDB's builtin disassembler on INFO, which is a
+       gdb.disassembler.DisassembleInfo object.  MEMORY_SOURCE is
+       optional, its default value is None.  If MEMORY_SOURCE is not
+       None then it must be an object that has a 'read_memory' method.
+
+     - gdb.disassembler.DisassemblerResult is a class that can be used
+       to wrap the result of a call to a Disassembler.  It has
+       read-only attributes 'length' and 'string'.
+
 *** Changes in GDB 12
 
 * DBX mode is deprecated, and will be removed in GDB 13
diff --git a/gdb/data-directory/Makefile.in b/gdb/data-directory/Makefile.in
index b606fc654b5..cf5226f3961 100644
--- a/gdb/data-directory/Makefile.in
+++ b/gdb/data-directory/Makefile.in
@@ -69,6 +69,7 @@ PYTHON_DIR = python
 PYTHON_INSTALL_DIR = $(DESTDIR)$(GDB_DATADIR)/$(PYTHON_DIR)
 PYTHON_FILE_LIST = \
 	gdb/__init__.py \
+	gdb/disassembler.py \
 	gdb/FrameDecorator.py \
 	gdb/FrameIterator.py \
 	gdb/frames.py \
diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
index cb5283e03c0..a7e92af9cfe 100644
--- a/gdb/doc/python.texi
+++ b/gdb/doc/python.texi
@@ -222,6 +222,7 @@
 * Registers In Python::         Python representation of registers.
 * Connections In Python::	Python representation of connections.
 * TUI Windows In Python::       Implementing new TUI windows.
+* Disassembly In Python::       Instruction Disassembly In Python
 @end menu
 
 @node Basic Python
@@ -598,6 +599,7 @@
 related prompts are prohibited from being changed.
 @end defun
 
+@anchor{gdb_architecture_names}
 @defun gdb.architecture_names ()
 Return a list containing all of the architecture names that the
 current build of @value{GDBN} supports.  Each architecture name is a
@@ -3278,6 +3280,7 @@
 particular frame (@pxref{Frames In Python}).
 @end defun
 
+@anchor{gdbpy_inferior_read_memory}
 @findex Inferior.read_memory
 @defun Inferior.read_memory (address, length)
 Read @var{length} addressable memory units from the inferior, starting at
@@ -6562,6 +6565,250 @@
 values can be 1 (left), 2 (middle), or 3 (right).
 @end defun
 
+@node Disassembly In Python
+@cindex python instruction disassembly
+@subsubsection Instruction Disassembly In Python
+
+@value{GDBN}'s builtin disassembler can be extended, or even replaced,
+using the Python API.  The disassembler related features are contained
+within the @code{gdb.disassembler} module:
+
+@deftp {class} gdb.disassembler.DisassembleInfo
+Disassembly is driven by instances of this class.  Each time
+@value{GDBN} needs to disassemble an instruction, an instance of this
+class is created and passed to a registered disassembler.  The
+disassembler is then responsible for disassembling an instruction and
+returning a result.  This class has the following properties and
+methods:
+
+@defivar DisassembleInfo address
+An integer containing the address at which @value{GDBN} wishes to
+disassemble a single instruction.
+@end defivar
+
+@defivar DisassembleInfo architecture
+The @code{gdb.Architecture} (@pxref{Architectures In Python}) for
+which @value{GDBN} is currently disassembling.
+@end defivar
+
+@defivar DisassembleInfo progspace
+The @code{gdb.Progspace} (@pxref{Progspaces In Python,,Program Spaces
+In Python}) for which @value{GDBN} is currently disassembling.
+@end defivar
+
+@defmethod DisassembleInfo read_memory (length, offset)
+This method allows the disassembler to read the bytes of the
+instruction to be disassembled.  The method reads @var{length} bytes,
+starting at @var{offset} from
+@code{DisassembleInfo.address}.
+
+It is important that the disassembler read the instruction bytes using
+this method, rather than reading inferior memory directly, as in some
+cases @value{GDBN} disassembles from an internal buffer rather than
+directly from inferior memory.
+
+Returns a buffer object, which behaves much like an array or a string,
+just as @code{Inferior.read_memory} does
+(@pxref{gdbpy_inferior_read_memory,,Inferior.read_memory}).
+@end defmethod
+
+@defmethod DisassembleInfo is_valid ()
+Returns @code{True} if the @code{DisassembleInfo} object is valid,
+@code{False} if not.  A @code{DisassembleInfo} object will become
+invalid once the disassembly call for which the @code{DisassembleInfo}
+was created, has returned.  Calling other @code{DisassembleInfo}
+methods, or accessing @code{DisassembleInfo} properties, will raise a
+@code{RuntimeError} exception if it is invalid.
+@end defmethod
+@end deftp
+
+@deftp {class} Disassembler
+This is a base class from which all user implemented disassemblers
+must inherit.
+
+@defmethod Disassembler __init__ (name)
+The constructor takes @var{name}, a string, which should be a short
+name for this disassembler.  Currently, this name is only used in some
+debug output.
+@end defmethod
+
+@defmethod Disassembler __call__ (info)
+The @code{__call__} method must be overridden by sub-classes to
+perform disassembly.  Calling @code{__call__} on this base class will
+raise a @code{NotImplementedError} exception.
+
+The @var{info} argument is an instance of @code{DisassembleInfo}, and
+describes the instruction that @value{GDBN} wants disassembling.
+
+If this function returns @code{None}, this indicates to @value{GDBN}
+that this sub-class doesn't wish to disassemble the requested
+instruction.  @value{GDBN} will then use its builtin disassembler to
+perform the disassembly.
+
+Alternatively, this function can return an object that represents the
+disassembled instruction.  The object must have the following two
+attributes:
+
+@defvar length
+The length of the disassembled instruction in bytes, which must be
+greater than zero.
+@end defvar
+
+@defvar string
+A non-empty string representing the disassembled instruction.
+@end defvar
+
+The @code{DisassemblerResult} type is defined as a possible class to
+represent disassembled instructions, but it is not required to use
+this type, so long as the required attributes are present.
+
+The @code{__call__} method can raise a @code{gdb.MemoryError}
+exception (@pxref{Exception Handling}) to indicate to @value{GDBN}
+that there was a problem accessing the required memory, this will then
+be displayed by @value{GDBN} within the disassembler output.
+
+Any other exception type raised by the @code{__call__} method is an
+error, @value{GDBN} will display the error and then use its builtin
+disassembler to disassemble the instruction instead.
+@end defmethod
+@end deftp
+
+@deftp {class} DisassemblerResult
+This class can be used to hold the result of calling
+@w{@code{Disassembler.__call__}}.  It is not required to use this
+type, any type with the required attributes will do.
+
+The required properties, which this class provides, are:
+
+@defvar length
+The length of the disassembled instruction in bytes, which must be
+greater than zero.
+@end defvar
+
+@defvar string
+A non-empty string representing the disassembled instruction.
+@end defvar
+
+This class also provides a constructor:
+
+@defun DisassemblerResult.__init__ (@var{length}, @var{string})
+Initialise an instance of this class, @var{length} is the length of
+the disassembled instruction in bytes, which must be greater than
+zero, and @var{string} is a non-empty string that represents the
+disassembled instruction.
+@end defun
+@end deftp
+
+The following functions are also contained in the
+@code{gdb.disassembler} module:
+
+@defun register_disassembler (disassembler, architecture)
+The @var{disassembler} must be a sub-class of
+@code{gdb.disassembler.Disassembler}.
+
+The optional @var{architecture} is either a string, or the value
+@code{None}.  If it is a string, then it should be the name of an
+architecture known to @value{GDBN}, as returned either from
+@code{gdb.Architecture.name}
+(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from
+@code{gdb.architecture_names}
+(@pxref{gdb_architecture_names,,gdb.architecture_names}).
+
+The @var{disassembler} will be installed for the architecture named by
+@var{architecture}, or if @var{architecture} is @code{None}, then
+@var{disassembler} will be installed as a global disassembler for use
+by all architectures.
+
+@cindex disassembler in Python, global vs.@: specific
+@cindex search order for disassembler in Python
+@cindex look up of disassembler in Python
+@value{GDBN} only records a single disassembler for each architecture,
+and a single global disassembler.  Calling
+@code{register_disassembler} for an architecture, or for the global
+disassembler, will replace any existing disassembler registered for
+that @var{architecture} value.  The previous disassembler is returned.
+
+When @value{GDBN} is looking for a disassembler to use, @value{GDBN}
+first looks for an architecture specific disassembler.  If none has
+been registered then @value{GDBN} looks for a global disassembler (one
+registered with @var{architecture} set to @code{None}).  Only one
+disassembler is called to perform disassembly, so, if there is both an
+architecture specific disassembler, and a global disassembler
+registered, it is the architecture specific disassembler that will be
+used.
+
+@value{GDBN} tracks the architecture specific, and global
+disassemblers separately, so it doesn't matter in which order
+disassemblers are created or registered; an architecture specific
+disassembler, if present, will always be used in preference to a
+global disassembler.
+@end defun
+
+@defun builtin_disassemble (info, memory_source)
+This function calls back into @value{GDBN}'s builtin disassembler to
+disassemble the instruction identified by @var{info}, an instance of
+@code{DisassembleInfo}.
+
+If the builtin disassembler is successful then an instance of
+@code{DisassemblerResult} is returned.
+
+If the builtin disassembler fails then a @code{gdb.MemoryError}
+exception will be raised.
+
+The optional @var{memory_source} argument has the default value of
+@code{None}, in which case the builtin disassembler will read the
+instruction from memory in the normal way.
+
+If @var{memory_source} is not @code{None}, then it should be an
+instance of a class that implements the following method:
+
+@defmethod memory_source read_memory (length, offset)
+This method will be called by the builtin disassembler to fetch bytes
+of the instruction being disassembled.  @var{length} is the number of
+bytes to fetch, and @var{offset} is the offset from the address of the
+instruction being disassembled, this address is obtained from
+@code{DisassembleInfo.address}.
+
+This function should return a Python object that supports the buffer
+protocol, i.e.@: a string, an array, or the object returned from
+@code{DisassembleInfo.read_memory}.
+
+The length of the returned buffer @emph{must} be @var{length}
+otherwise a @code{ValueError} exception will be raised.
+
+Alternatively, this function can raise a @code{gdb.MemoryError}
+exception to indicate that the read failed.  Raising any other
+exception type is an error.
+
+It is important to understand that, even when this function raises a
+@code{gdb.MemoryError}, it is the internal disassembler itself that
+reports the memory error to @value{GDBN}.  The reason for this is that
+the disassembler might probe memory to see if a byte is readable or
+not; if the byte can't be read then the disassembler may choose not to
+report an error, but instead to disassemble the bytes that it does
+have available.
+@end defmethod
+@end defun
+
+Here is an example that registers a global disassembler.  The new
+disassembler invokes the builtin disassembler, and then adds a
+comment, @code{## Comment}, to each line of disassembly output:
+
+@smallexample
+class ExampleDisassembler(gdb.disassembler.Disassembler):
+    def __init__(self):
+        super(ExampleDisassembler, self).__init__("ExampleDisassembler")
+
+    def __call__(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        if result.string is not None:
+            length = result.length
+            text = result.string + "\t## Comment"
+            return gdb.disassembler.DisassemblerResult(length, text)
+
+gdb.disassembler.register_disassembler(ExampleDisassembler())
+@end smallexample
+
 @node Python Auto-loading
 @subsection Python Auto-loading
 @cindex Python auto-loading
diff --git a/gdb/python/lib/gdb/disassembler.py b/gdb/python/lib/gdb/disassembler.py
new file mode 100644
index 00000000000..19ec0ecf82f
--- /dev/null
+++ b/gdb/python/lib/gdb/disassembler.py
@@ -0,0 +1,109 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+"""Disassembler related module."""
+
+import gdb
+import _gdb.disassembler
+
+from _gdb.disassembler import *
+
+# Module global dictionary of gdb.disassembler.Disassembler objects.
+# The keys of this dictionary are bfd architecture names, or the
+# special value None.
+#
+# When a request to disassemble comes in we first lookup the bfd
+# architecture name from the gdbarch, if that name exists in this
+# dictionary then we use that Disassembler object.
+#
+# If there's no architecture specific disassembler then we look for
+# the key None in this dictionary, and if that key exists, we use that
+# disassembler.
+#
+# If none of the above checks found a suitable disassembler, then no
+# disassembly is performed in Python.
+_disassemblers_dict = {}
+
+
+class Disassembler(object):
+    """A base class from which all user implemented disassemblers must
+    inherit."""
+
+    def __init__(self, name):
+        """Constructor.  Takes a name, which should be a string, which can be
+        used to identify this disassembler in diagnostic messages."""
+        self.name = name
+
+    def __call__(self, info):
+        """A default implementation of __call__.  All sub-classes must
+        override this method.  Calling this default implementation will throw
+        a NotImplementedError exception."""
+        raise NotImplementedError("Disassembler.__call__")
+
+
+def register_disassembler(disassembler, architecture=None):
+    """Register a disassembler.  DISASSEMBLER is a sub-class of
+    gdb.disassembler.Disassembler.  ARCHITECTURE is either None or a
+    string, the name of an architecture known to GDB.
+
+    DISASSEMBLER is registered as a disassmbler for ARCHITECTURE, or
+    all architectures when ARCHITECTURE is None.
+
+    Returns the previous disassembler registered with this
+    ARCHITECTURE value.
+    """
+
+    if not isinstance(disassembler, Disassembler) and disassembler is not None:
+        raise TypeError("disassembler should sub-class gdb.disassembler.Disassembler")
+
+    old = None
+    if architecture in _disassemblers_dict:
+        old = _disassemblers_dict[architecture]
+        del _disassemblers_dict[architecture]
+    if disassembler is not None:
+        _disassemblers_dict[architecture] = disassembler
+
+    # Call the private _set_enabled function within the
+    # _gdb.disassembler module.  This function sets a global flag
+    # within GDB's C++ code that enables or dissables the Python
+    # disassembler functionality, this improves performance of the
+    # disassembler by avoiding unneeded calls into Python when we know
+    # that no disassemblers are registered.
+    _gdb.disassembler._set_enabled(len(_disassemblers_dict) > 0)
+    return old
+
+
+def _print_insn(info):
+    """This function is called by GDB when it wants to disassemble an
+    instruction.  INFO describes the instruction to be
+    disassembled."""
+
+    def lookup_disassembler(arch):
+        try:
+            name = arch.name()
+            if name is None:
+                return None
+            if name in _disassemblers_dict:
+                return _disassemblers_dict[name]
+            if None in _disassemblers_dict:
+                return _disassemblers_dict[None]
+            return None
+        except:
+            return None
+
+    disassembler = lookup_disassembler(info.architecture)
+    if disassembler is None:
+        return None
+    return disassembler(info)
diff --git a/gdb/python/py-disasm.c b/gdb/python/py-disasm.c
new file mode 100644
index 00000000000..e8b33fecee4
--- /dev/null
+++ b/gdb/python/py-disasm.c
@@ -0,0 +1,970 @@
+/* Python interface to instruction disassembly.
+
+   Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "defs.h"
+#include "python-internal.h"
+#include "dis-asm.h"
+#include "arch-utils.h"
+#include "charset.h"
+#include "disasm.h"
+#include "progspace.h"
+
+/* Implement gdb.disassembler.DisassembleInfo type.  An object of this type
+   represents a single disassembler request from GDB.  */
+
+struct disasm_info_object {
+  PyObject_HEAD
+
+  /* The architecture in which we are disassembling.  */
+  struct gdbarch *gdbarch;
+
+  /* The program_space in which we are disassembling.  */
+  struct program_space *program_space;
+
+  /* Address of the instruction to disassemble.  */
+  bfd_vma address;
+
+  /* The disassemble_info passed from core GDB, this contains the
+     callbacks necessary to read the instruction from core GDB, and to
+     print the disassembled instruction.  */
+  disassemble_info *gdb_info;
+};
+
+extern PyTypeObject disasm_info_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_info_object");
+
+/* Implement gdb.disassembler.DisassemblerResult type, an object that holds
+   the result of calling the disassembler.  This is mostly the length of
+   the disassembled instruction (in bytes), and the string representing the
+   disassembled instruction.  */
+
+struct disasm_result_object {
+  PyObject_HEAD
+
+  /* The length of the disassembled instruction in bytes.  */
+  int length;
+
+  /* A buffer which, when allocated, holds the disassembled content of an
+     instruction.  */
+  string_file *content;
+};
+
+extern PyTypeObject disasm_result_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_result_object");
+
+/* When this is false we fast path out of gdbpy_print_insn, which should
+   keep the performance impact of the Python disassembler down.  This is
+   set to true from Python by calling gdb.disassembler._set_enabled() when
+   the user registers a disassembler.  */
+
+static bool python_print_insn_enabled = false;
+
+/* A sub-class of gdb_disassembler that holds a pointer to a Python
+   DisassembleInfo object.  A pointer to an instance of this class is
+   placed in the application_data field of the disassemble_info that is
+   used when we call gdbarch_print_insn.  */
+
+struct gdbpy_disassembler : public gdb_printing_disassembler
+{
+  /* Constructor.  */
+  gdbpy_disassembler (disasm_info_object *obj, PyObject *memory_source);
+
+  /* Get the DisassembleInfo object pointer.  */
+  disasm_info_object *
+  py_disasm_info () const
+  {
+    return m_disasm_info_object;
+  }
+
+  /* Callbacks used by disassemble_info.  */
+  static void memory_error_func (int status, bfd_vma memaddr,
+				 struct disassemble_info *info);
+  static void print_address_func (bfd_vma addr,
+				  struct disassemble_info *info);
+  static int read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+			       unsigned int len,
+			       struct disassemble_info *info);
+
+  /* Return a reference to an optional that contains the address at which a
+     memory error occurred.  The optional will only have a value if a
+     memory error actually occurred.  */
+  const gdb::optional<CORE_ADDR> &memory_error_address () const
+  { return m_memory_error_address; }
+
+  /* Return the content of the disassembler as a string.  The contents are
+     moved out of the disassembler, so after this call the disassembler
+     contents have been reset back to empty.  */
+  std::string release ()
+  {
+    return m_string_file.release ();
+  }
+
+private:
+
+  /* Where the disassembler result is written.  */
+  string_file m_string_file;
+
+  /* The DisassembleInfo object we are disassembling for.  */
+  disasm_info_object *m_disasm_info_object;
+
+  /* When the user indicates that a memory error has occurred then the
+     address of the memory error is stored in here.  */
+  gdb::optional<CORE_ADDR> m_memory_error_address;
+
+  /* When the user calls the builtin_disassemble function, if they pass a
+     memory source object then a pointer to the object is placed in here,
+     otherwise, this field is nullptr.  */
+  PyObject *m_memory_source;
+};
+
+/* Return true if OBJ is still valid, otherwise, return false.  A valid OBJ
+   will have a non-nullptr gdb_info field.  */
+
+static bool
+disasm_info_object_is_valid (disasm_info_object *obj)
+{
+  return obj->gdb_info != nullptr;
+}
+
+/* Implement DisassembleInfo.is_valid(), really just a wrapper around the
+   disasm_info_object_is_valid function above.  */
+
+static PyObject *
+disasmpy_info_is_valid (PyObject *self, PyObject *args)
+{
+  disasm_info_object *disasm_obj = (disasm_info_object *) self;
+
+  if (disasm_info_object_is_valid (disasm_obj))
+    Py_RETURN_TRUE;
+
+  Py_RETURN_FALSE;
+}
+
+/* Set the Python exception to be a gdb.MemoryError object, with ADDRESS
+   as its payload.  */
+
+static void
+disasmpy_set_memory_error_for_address (CORE_ADDR address)
+{
+  PyObject *address_obj = gdb_py_object_from_longest (address).release ();
+  PyErr_SetObject (gdbpy_gdb_memory_error, address_obj);
+}
+
+
+/* Ensure that a gdb.disassembler.DisassembleInfo is valid.  */
+#define DISASMPY_DISASM_INFO_REQUIRE_VALID(Info)			\
+  do {									\
+    if (!disasm_info_object_is_valid (Info))					\
+      {									\
+	PyErr_SetString (PyExc_RuntimeError,				\
+			 _("DisassembleInfo is no longer valid."));	\
+	return nullptr;							\
+      }									\
+  } while (0)
+
+/* Implement gdb.disassembler.builtin_disassemble().  Calls back into GDB's
+   builtin disassembler.  The first argument is a DisassembleInfo object
+   describing what to disassemble.  The second argument is optional and
+   provides a mechanism to modify the memory contents that the builtin
+   disassembler will actually disassemble.
+
+   Returns an instance of gdb.disassembler.DisassemblerResult, an object
+   that wraps a disassembled instruction, or it raises a
+   gdb.MemoryError.  */
+
+static PyObject *
+disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *info_obj, *memory_source_obj = nullptr;
+  static const char *keywords[] = { "info", "memory_source", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
+					&disasm_info_object_type, &info_obj,
+					&memory_source_obj))
+    return nullptr;
+
+  disasm_info_object *disasm_info = (disasm_info_object *) info_obj;
+  if (!disasm_info_object_is_valid (disasm_info))
+    {
+      PyErr_SetString (PyExc_RuntimeError,
+		       _("DisassembleInfo is no longer valid."));
+      return nullptr;
+    }
+
+  /* A memory source is any object that provides the 'read_memory'
+     callback.  At this point we only check for the existence of a
+     'read_memory' attribute, if this isn't callable then we'll throw an
+     exception from within gdbpy_disassembler::read_memory_func.  */
+  if (memory_source_obj != nullptr)
+    {
+      if (!PyObject_HasAttrString (memory_source_obj, "read_memory"))
+	{
+	  PyErr_SetString (PyExc_TypeError,
+			   _("memory_source doesn't have a read_memory method"));
+	  return nullptr;
+	}
+    }
+
+  /* Where the result will be written.  */
+  gdbpy_disassembler disassembler (disasm_info, memory_source_obj);
+
+  /* Now actually perform the disassembly.  */
+  int length
+    = gdbarch_print_insn (disasm_info->gdbarch, disasm_info->address,
+			  disassembler.disasm_info ());
+
+  if (length == -1)
+    {
+
+      /* In an ideal world, every disassembler should always call the
+	 memory error function before returning a status of -1 as the only
+	 error a disassembler should encounter is a failure to read
+	 memory.  Unfortunately, there are some disassemblers who don't
+	 follow this rule, and will return -1 without calling the memory
+	 error function.
+
+	 To make the Python API simpler, we just classify everything as a
+	 memory error, but the message has to be modified for the case
+	 where the disassembler didn't call the memory error function.  */
+      if (disassembler.memory_error_address ().has_value ())
+	{
+	  CORE_ADDR addr = *disassembler.memory_error_address ();
+	  disasmpy_set_memory_error_for_address (addr);
+	}
+      else
+	PyErr_Format (gdbpy_gdb_memory_error, "unknown disassembly error");
+      return nullptr;
+    }
+
+  /* Instructions are either non-zero in length, or we got an error,
+     indicated by a length of -1, which we handled above.  */
+  gdb_assert (length > 0);
+
+  /* We should not have seen a memory error in this case.  */
+  gdb_assert (!disassembler.memory_error_address ().has_value ());
+
+  /* Create an object to represent the result of the disassembler.  */
+  gdbpy_ref<disasm_result_object> res
+    (PyObject_New (disasm_result_object, &disasm_result_object_type));
+  res->length = length;
+  res->content = new string_file;
+  *(res->content) = disassembler.release ();
+
+  return reinterpret_cast<PyObject *> (res.release ());
+}
+
+/* Implement gdb.set_enabled function.  Takes a boolean parameter, and
+   sets whether GDB should enter the Python disassembler code or not.
+
+   This is called from within the Python code when a new disassembler is
+   registered.  When no disassemblers are registered the global C++ flag
+   is set to false, and GDB never even enters the Python environment to
+   check for a disassembler.
+
+   When the user registers a new Python disassembler, the global C++ flag
+   is set to true, and now GDB will enter the Python environment to check
+   if there's a disassembler registered for the current architecture.  */
+
+static PyObject *
+disasmpy_set_enabled (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *newstate;
+  static const char *keywords[] = { "state", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O", keywords,
+					&newstate))
+    return nullptr;
+
+  if (!PyBool_Check (newstate))
+    {
+      PyErr_SetString (PyExc_TypeError,
+		       _("The value passed to `_set_enabled' must be a boolean."));
+      return nullptr;
+    }
+
+  python_print_insn_enabled = PyObject_IsTrue (newstate);
+  Py_RETURN_NONE;
+}
+
+/* Implement DisassembleInfo.read_memory(LENGTH, OFFSET).  Read LENGTH
+   bytes at OFFSET from the start of the instruction currently being
+   disassembled, and return a memory buffer containing the bytes.
+
+   OFFSET defaults to zero if it is not provided.  LENGTH is required.  If
+   the read fails then this will raise a gdb.MemoryError exception.  */
+
+static PyObject *
+disasmpy_info_read_memory (PyObject *self, PyObject *args, PyObject *kw)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+
+  LONGEST length, offset = 0;
+  gdb::unique_xmalloc_ptr<gdb_byte> buffer;
+  static const char *keywords[] = { "length", "offset", nullptr };
+
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "L|L", keywords,
+					&length, &offset))
+    return nullptr;
+
+  /* The apparent address from which we are reading memory.  Note that in
+     some cases GDB actually disassembles instructions from a buffer, so
+     we might not actually be reading this information directly from the
+     inferior memory.  This is all hidden behind the read_memory_func API
+     within the disassemble_info structure.  */
+  CORE_ADDR address = obj->address + offset;
+
+  /* Setup a buffer to hold the result.  */
+  buffer.reset ((gdb_byte *) xmalloc (length));
+
+  /* Read content into BUFFER.  If the read fails then raise a memory
+     error, otherwise, convert BUFFER to a Python memory buffer, and return
+     it to the user.  */
+  disassemble_info *info = obj->gdb_info;
+  if (info->read_memory_func ((bfd_vma) address, buffer.get (),
+			      (unsigned int) length, info) != 0)
+    {
+      disasmpy_set_memory_error_for_address (address);
+      return nullptr;
+    }
+  return gdbpy_buffer_to_membuf (std::move (buffer), address, length);
+}
+
+/* Implement DisassembleInfo.address attribute, return the address at which
+   GDB would like an instruction disassembled.  */
+
+static PyObject *
+disasmpy_info_address (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdb_py_object_from_longest (obj->address).release ();
+}
+
+/* Implement DisassembleInfo.architecture attribute.  Return the
+   gdb.Architecture in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_architecture (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdbarch_to_arch_object (obj->gdbarch);
+}
+
+/* Implement DisassembleInfo.progspace attribute.  Return the
+   gdb.Progspace in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_progspace (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return pspace_to_pspace_object (obj->program_space).release ();
+}
+
+/* This implements the disassemble_info read_memory_func callback.  This
+   will either call the standard read memory function, or, if the user has
+   supplied a memory source (see disasmpy_builtin_disassemble) then this
+   will call back into Python to obtain the memory contents.
+
+   Read LEN bytes from MEMADDR and place them into BUFF.  Return 0 on
+   success (in which case BUFF has been filled), or -1 on error, in which
+   case the contents of BUFF are undefined.  */
+
+int
+gdbpy_disassembler::read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+				      unsigned int len,
+				      struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  disasm_info_object *obj = dis->py_disasm_info ();
+  PyObject *memory_source = dis->m_memory_source;
+
+  /* The simple case, the user didn't pass a separate memory source, so we
+     just delegate to the standard disassemble_info read_memory_func,
+     passing in the original disassemble_info object, which core GDB might
+     require in order to read the instruction bytes (when reading the
+     instruction from a buffer).  */
+  if (memory_source == nullptr)
+    return obj->gdb_info->read_memory_func (memaddr, buff, len, obj->gdb_info);
+
+  /* The user provided a separate memory source, we need to call the
+     read_memory method on the memory source and use the buffer it returns
+     as the bytes of memory.  */
+  LONGEST offset = (LONGEST) memaddr - (LONGEST) obj->address;
+  gdbpy_ref<> result_obj (PyObject_CallMethod (memory_source, "read_memory",
+					       "KL", len, offset));
+  if (result_obj == nullptr)
+    {
+      /* If we got a gdb.MemoryError then we ignore this and just report
+	 that the read failed to the caller.  The caller is then
+	 responsible for calling the memory_error_func if it wants to.
+	 Remember, the disassembler might just be probing to see if these
+	 bytes can be read, if we automatically call the memory error
+	 function, we can end up registering an error prematurely.  */
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	PyErr_Clear ();
+      else
+	gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Convert the result to a buffer.  */
+  Py_buffer py_buff;
+  if (!PyObject_CheckBuffer (result_obj.get ())
+      || PyObject_GetBuffer (result_obj.get(), &py_buff, PyBUF_CONTIG_RO) < 0)
+    {
+      PyErr_Format (PyExc_TypeError,
+		    _("Result from read_memory is not a buffer"));
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Wrap PY_BUFF so that it is cleaned up correctly at the end of this
+     scope.  */
+  Py_buffer_up buffer_up (&py_buff);
+
+  /* Validate that the buffer is the correct length.  */
+  if (py_buff.len != len)
+    {
+      PyErr_Format (PyExc_ValueError,
+		    _("Result from read_memory is incorrectly sized buffer"));
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Copy the data out of the Python buffer and return succsess.*/
+  const gdb_byte *buffer = (const gdb_byte *) py_buff.buf;
+  memcpy (buff, buffer, len);
+  return 0;
+}
+
+/* Implement DisassemblerResult.length attribute, return the length of the
+   disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_length (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  return gdb_py_object_from_longest (obj->length).release ();
+}
+
+/* Implement DisassemblerResult.string attribute, return the content string
+   of the disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_string (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+
+  gdb_assert (obj->content != nullptr);
+  gdb_assert (strlen (obj->content->c_str ()) > 0);
+  gdb_assert (obj->length > 0);
+  return PyUnicode_Decode (obj->content->c_str (),
+			   obj->content->size (),
+			   host_charset (), nullptr);
+}
+
+/* Implement DisassemblerResult.__init__.  Takes two arguments, an
+   integer, the length in bytes of the disassembled instruction, and a
+   string, the disassembled content of the instruction.  */
+
+static int
+disasmpy_result_init (PyObject *self, PyObject *args, PyObject *kwargs)
+{
+  static const char *keywords[] = { "length", "string", NULL };
+  int length;
+  const char *string;
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kwargs, "is", keywords,
+					&length, &string))
+    return -1;
+
+  if (length <= 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("Length must be greater than 0."));
+      return -1;
+    }
+
+  if (strlen (string) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("String must not be empty."));
+      return -1;
+    }
+
+  disasm_result_object *obj = (disasm_result_object *) self;
+  if (obj->content == nullptr)
+    obj->content = new string_file;
+  obj->length = length;
+  obj->content->write (string, strlen (string));
+  return 0;
+}
+
+/* Implement memory_error_func callback for disassemble_info.  Extract the
+   underlying DisassembleInfo Python object, and set a memory error on
+   it.  */
+
+void
+gdbpy_disassembler::memory_error_func (int status, bfd_vma memaddr,
+				       struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  dis->m_memory_error_address.emplace (memaddr);
+}
+
+/* Wrapper of print_address.  */
+
+void
+gdbpy_disassembler::print_address_func (bfd_vma addr,
+					struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  print_address (dis->arch (), addr, (struct ui_file *) info->stream);
+}
+
+/* constructor.  */
+
+gdbpy_disassembler::gdbpy_disassembler (disasm_info_object *obj,
+					PyObject *memory_source)
+  : gdb_printing_disassembler (obj->gdbarch, &m_string_file,
+			       read_memory_func, memory_error_func,
+			       print_address_func),
+    m_disasm_info_object (obj),
+    m_memory_source (memory_source)
+{ /* Nothing.  */ }
+
+/* A wrapper around a reference to a Python DisassembleInfo object, which
+   ensures that the object is marked as invalid when we leave the enclosing
+   scope.
+
+   Each DisassembleInfo is created in gdbpy_print_insn, and is done with by
+   the time that function returns.  However, there's nothing to stop a user
+   caching a reference to the DisassembleInfo, and thus keeping the object
+   around.
+
+   We therefore have the notion of a DisassembleInfo becoming invalid, this
+   happens when gdbpy_print_insn returns.  This class is responsible for
+   marking the DisassembleInfo as invalid in its destructor.  */
+
+struct scoped_disasm_info_object
+{
+  /* Constructor.  */
+  scoped_disasm_info_object (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+			     disassemble_info *info)
+    : m_disasm_info (allocate_disasm_info_object ())
+  {
+    m_disasm_info->address = memaddr;
+    m_disasm_info->gdb_info = info;
+    m_disasm_info->gdbarch = gdbarch;
+    m_disasm_info->program_space = current_program_space;
+  }
+
+  /* Upon destruction mark m_diasm_info as invalid.  */
+  ~scoped_disasm_info_object ()
+  {
+    m_disasm_info->gdb_info = nullptr;
+  }
+
+  /* Return a pointer to the underlying disasm_info_object instance.  */
+  disasm_info_object *
+  get () const
+  {
+    return m_disasm_info.get ();
+  }
+
+private:
+
+  /* Wrapper around the call to PyObject_New, this wrapper function can be
+     called from the constructor initialization list, while PyObject_New, a
+     macro, can't.  */
+  static disasm_info_object *
+  allocate_disasm_info_object ()
+  {
+    return (disasm_info_object *) PyObject_New (disasm_info_object,
+						&disasm_info_object_type);
+  }
+
+  /* A reference to a gdb.disassembler.DisassembleInfo object.  When this
+     containing instance goes out of scope this reference is released,
+     however, the user might be holding other references to the
+     DisassembleInfo object in Python code, so the underlying object might
+     not be deleted.  */
+  gdbpy_ref<disasm_info_object> m_disasm_info;
+};
+
+/* See python-internal.h.  */
+
+gdb::optional<int>
+gdbpy_print_insn (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+		  disassemble_info *info)
+{
+  /* Early exit case.  This must be done as early as possible, and
+     definitely before we enter Python environment.  The
+     python_print_insn_enabled flag is set (from Python) only when the user
+     has installed one (or more) Python disassemblers.  So in the common
+     case (no custom disassembler installed) this flag will be false,
+     allowing for a quick return.  */
+  if (!gdb_python_initialized || !python_print_insn_enabled)
+    return {};
+
+  gdbpy_enter enter_py (get_current_arch (), current_language);
+
+  /* The attribute we are going to lookup that provides the print_insn
+     functionality.  */
+  static const char *callback_name = "_print_insn";
+
+  /* Grab a reference to the gdb.disassembler module, and check it has the
+     attribute that we need.  */
+  gdbpy_ref<> gdb_python_disassembler_module
+    (PyImport_ImportModule ("gdb.disassembler"));
+  if (gdb_python_disassembler_module == nullptr
+      || !PyObject_HasAttrString (gdb_python_disassembler_module.get (),
+				  callback_name))
+    return {};
+
+  /* Now grab the callback attribute from the module.  */
+  gdbpy_ref<> hook
+    (PyObject_GetAttrString (gdb_python_disassembler_module.get (),
+			     callback_name));
+  if (hook == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Create the new DisassembleInfo object we will pass into Python.  This
+     object will be marked as invalid when we leave this scope.  */
+  scoped_disasm_info_object scoped_disasm_info (gdbarch, memaddr, info);
+  disasm_info_object *disasm_info = scoped_disasm_info.get ();
+
+  /* Call into the registered disassembler to (possibly) perform the
+     disassembly.  */
+  PyObject *insn_disas_obj = (PyObject *) disasm_info;
+  gdbpy_ref<> result (PyObject_CallFunctionObjArgs (hook.get (),
+						    insn_disas_obj,
+						    nullptr));
+
+  if (result == nullptr)
+    {
+      /* The call into Python code resulted in an exception.  If this was a
+	 gdb.MemoryError, then we can figure out an address and call the
+	 disassemble_info::memory_error_func to report the error back to
+	 core GDB.  Any other exception type we assume means a bug in the
+	 user's code, and print stack.  */
+
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	{
+	  /* A gdb.MemoryError might have an address attribute which
+	     contains the address at which the memory error occurred.  If
+	     this is the case then use this address, otherwise, fallback to
+	     just using the address of the instruction we were asked to
+	     disassemble.  */
+	  PyObject *error_type, *error_value, *error_traceback;
+	  CORE_ADDR addr;
+
+	  PyErr_Fetch (&error_type, &error_value, &error_traceback);
+
+	  if (error_value != nullptr
+	      && PyObject_HasAttrString (error_value, "address"))
+	    {
+	      PyObject *addr_obj = PyObject_GetAttrString (error_value,
+							   "address");
+	      if (get_addr_from_python (addr_obj, &addr) < 0)
+		addr = disasm_info->address;
+	    }
+	  else
+	    addr = disasm_info->address;
+
+	  PyErr_Clear ();
+	  info->memory_error_func (-1, addr, info);
+	  return gdb::optional<int> (-1);
+	}
+      else
+	{
+	  /* Anything that is not gdb.MemoryError.  */
+	  gdbpy_print_stack ();
+	  return {};
+	}
+    }
+  else if (result == Py_None)
+    {
+      /* A return value of None indicates that the Python code could not,
+	 or doesn't want to, disassemble this instruction.  Just return an
+	 empty result and core GDB will try to disassemble this for us.  */
+      return {};
+    }
+
+  /* The call into Python neither raised an exception, or returned None.
+     Check to see if the result looks valid.  */
+  gdbpy_ref<> length_obj (PyObject_GetAttrString (result.get (), "length"));
+  if (length_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  gdbpy_ref<> string_obj (PyObject_GetAttrString (result.get (), "string"));
+  if (string_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+  if (!gdbpy_is_string (string_obj.get ()))
+    {
+      PyErr_SetString (PyExc_TypeError, _("string attribute is not a string."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  gdb::unique_xmalloc_ptr<char> string
+    = gdbpy_obj_to_string (string_obj.get ());
+  if (string == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  long length;
+  if (!gdb_py_int_as_long (length_obj.get (), &length))
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  long max_insn_length = (gdbarch_max_insn_length_p (gdbarch) ?
+			  gdbarch_max_insn_length (gdbarch) : INT_MAX);
+  if (length <= 0 || length > max_insn_length)
+    {
+      PyErr_SetString (PyExc_ValueError, _("Invalid length attribute."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  if (strlen (string.get ()) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError, _("string attribute must not be empty."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Print the disassembled instruction back to core GDB, and return the
+     length of the disassembled instruction.  */
+  info->fprintf_func (info->stream, "%s", string.get ());
+  return gdb::optional<int> (length);
+}
+
+/* The tp_dealloc callback for the DisassemblerResult type.  Takes care of
+   deallocating the content buffer.  */
+
+static void
+disasmpy_dealloc_result (PyObject *self)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  delete obj->content;
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* The get/set attributes of the gdb.disassembler.DisassembleInfo type.  */
+
+static gdb_PyGetSetDef disasm_info_object_getset[] = {
+  { "address", disasmpy_info_address, nullptr,
+    "Start address of the instruction to disassemble.", nullptr },
+  { "architecture", disasmpy_info_architecture, nullptr,
+    "Architecture to disassemble in", nullptr },
+  { "progspace", disasmpy_info_progspace, nullptr,
+    "Program space to disassemble in", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* The methods of the gdb.disassembler.DisassembleInfo type.  */
+
+static PyMethodDef disasm_info_object_methods[] = {
+  { "read_memory", (PyCFunction) disasmpy_info_read_memory,
+    METH_VARARGS | METH_KEYWORDS,
+    "read_memory (LEN, OFFSET = 0) -> Octets[]\n\
+Read LEN octets for the instruction to disassemble." },
+  { "is_valid", disasmpy_info_is_valid, METH_NOARGS,
+    "is_valid () -> Boolean.\n\
+Return true if this DisassembleInfo is valid, false if not." },
+  {nullptr}  /* Sentinel */
+};
+
+/* The get/set attributes of the gdb.disassembler.DisassemblerResult type.  */
+
+static gdb_PyGetSetDef disasm_result_object_getset[] = {
+  { "length", disasmpy_result_length, nullptr,
+    "Length of the disassembled instruction.", nullptr },
+  { "string", disasmpy_result_string, nullptr,
+    "String representing the disassembled instruction.", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* These are the methods we add into the _gdb.disassembler module, which
+   are then imported into the gdb.disassembler module.  These are global
+   functions that support performing disassembly.  */
+
+PyMethodDef python_disassembler_methods[] =
+{
+  { "builtin_disassemble", (PyCFunction) disasmpy_builtin_disassemble,
+    METH_VARARGS | METH_KEYWORDS,
+    "builtin_disassemble (INFO, MEMORY_SOURCE = None) -> None\n\
+Disassemble using GDB's builtin disassembler.  INFO is an instance of\n\
+gdb.disassembler.DisassembleInfo.  The MEMORY_SOURCE, if not None, should\n\
+be an object with the read_memory method." },
+  { "_set_enabled", (PyCFunction) disasmpy_set_enabled,
+    METH_VARARGS | METH_KEYWORDS,
+    "_set_enabled (STATE) -> None\n\
+Set whether GDB should call into the Python _print_insn code or not." },
+  {nullptr, nullptr, 0, nullptr}
+};
+
+/* Structure to define the _gdb.disassembler module.  */
+
+static struct PyModuleDef python_disassembler_module_def =
+{
+  PyModuleDef_HEAD_INIT,
+  "_gdb.disassembler",
+  nullptr,
+  -1,
+  python_disassembler_methods,
+  nullptr,
+  nullptr,
+  nullptr,
+  nullptr
+};
+
+/* Called to initialize the Python structures in this file.  */
+
+int
+gdbpy_initialize_disasm
+(void)
+{
+  /* Create the _gdb.disassembler module, and add it to the _gdb module.  */
+
+  PyObject *gdb_disassembler_module;
+  gdb_disassembler_module = PyModule_Create (&python_disassembler_module_def);
+  if (gdb_disassembler_module == nullptr)
+    return -1;
+  PyModule_AddObject(gdb_module, "disassembler", gdb_disassembler_module);
+
+  /* This is needed so that 'import _gdb.disassembler' will work.  */
+  PyObject *dict = PyImport_GetModuleDict ();
+  PyDict_SetItemString (dict, "_gdb.disassembler", gdb_disassembler_module);
+
+  /* Having the tp_new field as nullptr means that this class can't be
+     created from user code.  The only way they can be created is from
+     within GDB, and then they are passed into user code.  */
+  gdb_assert (disasm_info_object_type.tp_new == nullptr);
+  if (PyType_Ready (&disasm_info_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassembleInfo",
+			      (PyObject *) &disasm_info_object_type) < 0)
+    return -1;
+
+  disasm_result_object_type.tp_new = PyType_GenericNew;
+  if (PyType_Ready (&disasm_result_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassemblerResult",
+			      (PyObject *) &disasm_result_object_type) < 0)
+    return -1;
+
+  return 0;
+}
+
+/* Describe the gdb.disassembler.DisassembleInfo type.  */
+
+PyTypeObject disasm_info_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassembleInfo",		/*tp_name*/
+  sizeof (disasm_info_object),			/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  0,						/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT,				/*tp_flags*/
+  "GDB instruction disassembler object",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  disasm_info_object_methods,			/* tp_methods */
+  0,						/* tp_members */
+  disasm_info_object_getset			/* tp_getset */
+};
+
+/* Describe the gdb.disassembler.DisassemblerResult type.  */
+
+PyTypeObject disasm_result_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassemblerResult",	/*tp_name*/
+  sizeof (disasm_result_object),		/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  disasmpy_dealloc_result,			/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,	/*tp_flags*/
+  "GDB object, representing a disassembler result",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  0,						/* tp_methods */
+  0,						/* tp_members */
+  disasm_result_object_getset,			/* tp_getset */
+  0,						/* tp_base */
+  0,						/* tp_dict */
+  0,						/* tp_descr_get */
+  0,						/* tp_descr_set */
+  0,						/* tp_dictoffset */
+  disasmpy_result_init,				/* tp_init */
+  0,						/* tp_alloc */
+};
diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
index d947b96033b..ed5894c1c3d 100644
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@@ -540,6 +540,8 @@ int gdbpy_initialize_connection ()
 int gdbpy_initialize_micommands (void)
   CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 void gdbpy_finalize_micommands ();
+int gdbpy_initialize_disasm ()
+  CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 
 /* A wrapper for PyErr_Fetch that handles reference counting for the
    caller.  */
@@ -822,4 +824,18 @@ extern bool gdbpy_is_architecture (PyObject *obj);
 
 extern bool gdbpy_is_progspace (PyObject *obj);
 
+/* Implement the 'print_insn' hook for Python.  Disassemble an instruction
+   whose address is ADDRESS for architecture GDBARCH.  The bytes of the
+   instruction should be read with INFO->read_memory_func as the
+   instruction being disassembled might actually be in a buffer.
+
+   Used INFO->fprintf_func to print the results of the disassembly, and
+   return the length of the instruction in octets.
+
+   If no instruction can be disassembled then return an empty value.  */
+
+extern gdb::optional<int> gdbpy_print_insn (struct gdbarch *gdbarch,
+					    CORE_ADDR address,
+					    disassemble_info *info);
+
 #endif /* PYTHON_PYTHON_INTERNAL_H */
diff --git a/gdb/python/python.c b/gdb/python/python.c
index b5b8379e23c..084b3687fec 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -167,7 +167,7 @@ static const struct extension_language_ops python_extension_ops =
 
   gdbpy_colorize_disasm,
 
-  NULL, /* gdbpy_print_insn, */
+  gdbpy_print_insn,
 };
 
 #endif /* HAVE_PYTHON */
@@ -2045,6 +2045,7 @@ do_start_initialization ()
 
   if (gdbpy_initialize_auto_load () < 0
       || gdbpy_initialize_values () < 0
+      || gdbpy_initialize_disasm () < 0
       || gdbpy_initialize_frames () < 0
       || gdbpy_initialize_commands () < 0
       || gdbpy_initialize_instruction () < 0
diff --git a/gdb/testsuite/gdb.python/py-disasm.c b/gdb/testsuite/gdb.python/py-disasm.c
new file mode 100644
index 00000000000..ee0bb157f4d
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.c
@@ -0,0 +1,25 @@
+/* This test program is part of GDB, the GNU debugger.
+
+   Copyright 2021-2022 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+int
+main ()
+{
+  asm ("nop");
+  asm ("nop");	/* Break here.  */
+  asm ("nop");
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.exp b/gdb/testsuite/gdb.python/py-disasm.exp
new file mode 100644
index 00000000000..ea7847fc6df
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.exp
@@ -0,0 +1,150 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This file is part of the GDB testsuite.  It validates the Python
+# disassembler API.
+
+load_lib gdb-python.exp
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile} "debug"] } {
+    return -1
+}
+
+# Skip all tests if Python scripting is not enabled.
+if { [skip_python_tests] } { continue }
+
+if ![runto_main] then {
+    fail "can't run to main"
+    return 0
+}
+
+set pyfile [gdb_remote_download host ${srcdir}/${subdir}/${testfile}.py]
+
+gdb_test "source ${pyfile}" "Python script imported" \
+         "import python scripts"
+
+gdb_breakpoint [gdb_get_line_number "Break here."]
+gdb_continue_to_breakpoint "Break here."
+
+set curr_pc [get_valueof "/x" "\$pc" "*unknown*"]
+
+gdb_test_no_output "python current_pc = ${curr_pc}"
+
+# The current pc will be something like 0x1234 with no leading zeros.
+# However, in the disassembler output addresses are padded with zeros.
+# This substitution changes 0x1234 to 0x0*1234, which can then be used
+# as a regexp in the disassembler output matching.
+set curr_pc_pattern [string replace ${curr_pc} 0 1 "0x0*"]
+
+# Grab the name of the current architecture, this is used in the tests
+# patterns below.
+set curr_arch [get_python_valueof "gdb.selected_inferior().architecture().name()" "*unknown*"]
+
+# Helper proc that removes all registered disassemblers.
+proc py_remove_all_disassemblers {} {
+    gdb_test_no_output "python remove_all_python_disassemblers()"
+}
+
+# A list of test plans.  Each plan is a list of two elements, the
+# first element is the name of a class in py-disasm.py, this is a
+# disassembler class.  The second element is a pattern that should be
+# matched in the disassembler output.
+#
+# Each different disassembler tests some different feature of the
+# Python disassembler API.
+set addr_pattern "\r\n=> ${curr_pc_pattern} <\[^>\]+>:\\s+"
+set base_pattern "${addr_pattern}nop"
+set test_plans \
+    [list \
+	 [list "" "${base_pattern}\r\n.*"] \
+	 [list "GlobalNullDisassembler" "${base_pattern}\r\n.*"] \
+	 [list "GlobalPreInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalPostInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalReadDisassembler" "${base_pattern}\\s+## bytes =( $hex)+\r\n.*"] \
+	 [list "GlobalAddrDisassembler" "${base_pattern}\\s+## addr = ${curr_pc_pattern} <\[^>\]+>\r\n.*"] \
+	 [list "NonMemoryErrorEarlyDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: non-memory error instead of a result\r\nnop\r\n.*"] \
+	 [list "NonMemoryErrorLateDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: non-memory error after builtin disassembler\r\nnop\r\n.*"] \
+	 [list "MemoryErrorEarlyDisassembler" "${base_pattern}\\s+## AFTER ERROR\r\n.*"] \
+	 [list "MemoryErrorLateDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "RethrowMemoryErrorDisassembler" "${addr_pattern}Cannot access memory at address $hex"] \
+	 [list "FaultingMemorySourceDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "FailingMemorySourceDisassembler" "${addr_pattern}Python Exception <class 'gdb.GdbError'>: the memory source failed\r\n\r\nCannot access memory at address ${curr_pc_pattern}"]]
+
+# Now execute each test plan.
+foreach plan $test_plans {
+    set global_disassembler_name [lindex $plan 0]
+    set expected_pattern [lindex $plan 1]
+
+    with_test_prefix "global_disassembler=${global_disassembler_name}" {
+	# Remove all existing disassemblers.
+	py_remove_all_disassemblers
+
+	# If we have a disassembler to load, do it now.
+	if { $global_disassembler_name != "" } {
+	    gdb_test_no_output "python add_global_disassembler($global_disassembler_name)"
+	}
+
+	# Disassemble main, and check the disassembler output.
+	gdb_test "disassemble main" $expected_pattern
+    }
+}
+
+# Check that the architecture specific disassemblers can override the
+# global disassembler.
+#
+# First, register a global disassembler, and check it is in place.
+with_test_prefix "GLOBAL tagging disassembler" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"GLOBAL\"), None)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Now register an architecture specific disassembler, and check it
+# overrides the global disassembler.
+with_test_prefix "LOCAL tagging disassembler" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"LOCAL\"), \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = LOCAL\r\n.*"
+}
+
+# Now remove the architecture specific disassembler, and check that
+# the global disassembler kicks back in.
+with_test_prefix "GLOBAL tagging disassembler again" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(None, \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Check that a DisassembleInfo becomes invalid after the call into the
+# disassembler.
+with_test_prefix "DisassembleInfo becomes invalid" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python add_global_disassembler(GlobalCachingDisassembler)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## CACHED\r\n.*"
+    gdb_test "python GlobalCachingDisassembler.check()" "PASS"
+}
+
+# Test the memory source aspect of the builtin disassembler.
+with_test_prefix "memory source api" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(analyzing_disassembler)"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*"
+    gdb_test "python analyzing_disassembler.find_replacement_candidate()" \
+	"Replace from $hex to $hex with NOP"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*" \
+	"second disassembler pass"
+    gdb_test "python analyzing_disassembler.check()" \
+	"PASS"
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.py b/gdb/testsuite/gdb.python/py-disasm.py
new file mode 100644
index 00000000000..a05244dbb1b
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.py
@@ -0,0 +1,456 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import gdb
+import gdb.disassembler
+import struct
+import sys
+
+from gdb.disassembler import Disassembler, DisassemblerResult
+
+# A global, holds the program-counter address at which we should
+# perform the extra disassembly that this script provides.
+current_pc = None
+
+
+def remove_all_python_disassemblers():
+    for a in gdb.architecture_names():
+        gdb.disassembler.register_disassembler(None, a)
+    gdb.disassembler.register_disassembler(None, None)
+
+
+class TestDisassembler(Disassembler):
+    """A base class for disassemblers within this script to inherit from.
+    Implements the __call__ method and ensures we only do any
+    disassembly wrapping for the global CURRENT_PC."""
+
+    def __init__(self):
+        global current_pc
+
+        super(TestDisassembler, self).__init__("TestDisassembler")
+        if current_pc == None:
+            raise gdb.GdbError("no current_pc set")
+
+    def __call__(self, info):
+        global current_pc
+
+        if info.address != current_pc:
+            return None
+        return self.disassemble(info)
+
+    def disassemble(self, info):
+        raise NotImplementedError("override the disassemble method")
+
+
+class GlobalPreInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo before disassembly has occurred."""
+
+    def disassemble(self, info):
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalPostInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo after disassembly has occurred."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalReadDisassembler(TestDisassembler):
+    """Check the DisassembleInfo.read_memory method.  Calls the builtin
+    disassembler, then reads all of the bytes of this instruction, and
+    adds them as a comment to the disassembler output."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        len = result.length
+        str = ""
+        for o in range(len):
+            if str != "":
+                str += " "
+            v = bytes(info.read_memory(1, o))[0]
+            if sys.version_info[0] < 3:
+                v = struct.unpack("<B", v)
+            str += "0x%02x" % v
+        text = result.string + "\t## bytes = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalAddrDisassembler(TestDisassembler):
+    """Check the gdb.format_address method."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        arch = info.architecture
+        addr = info.address
+        program_space = info.progspace
+        str = gdb.format_address(addr, program_space, arch)
+        text = result.string + "\t## addr = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class NonMemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a non-memory error instead of performing any disassembly."""
+
+    def disassemble(self, info):
+        raise gdb.GdbError("non-memory error instead of a result")
+
+
+class NonMemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a non-memory error after calling the builtin disassembler."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        raise gdb.GdbError("non-memory error after builtin disassembler")
+
+
+class MemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a memory error, ignore the error and disassemble."""
+
+    def disassemble(self, info):
+        tag = "## FAIL"
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError:
+            tag = "## AFTER ERROR"
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t" + tag
+        return DisassemblerResult(result.length, text)
+
+
+class MemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a memory error after calling the builtin disassembler, but
+    before we return a result."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        # The following read will throw an error.
+        info.read_memory(1, -info.address + 2)
+        return DisassemblerResult(1, "BAD")
+
+
+class RethrowMemoryErrorDisassembler(TestDisassembler):
+    """Catch and rethrow a memory error."""
+
+    def disassemble(self, info):
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError as e:
+            raise gdb.MemoryError("cannot read code at address 0x2")
+        return DisassemblerResult(1, "BAD")
+
+
+class TaggingDisassembler(TestDisassembler):
+    """A simple disassembler that just tags the output."""
+
+    def __init__(self, tag):
+        super(TaggingDisassembler, self).__init__()
+        self._tag = tag
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## tag = %s" % self._tag
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalCachingDisassembler(TestDisassembler):
+    """A disassembler that caches the DisassembleInfo that is passed in. Once
+    the call into the disassembler is complete then the DisassembleInfo
+    becomes invalid, and any calls into it should trigger an
+    exception."""
+
+    # This is where we cache the DisassembleInfo object.
+    cached_insn_disas = None
+
+    def disassemble(self, info):
+        """Disassemble the instruction, add a CACHED comment to the output,
+        and cache the DisassembleInfo so that it is not garbage collected."""
+        GlobalCachingDisassembler.cached_insn_disas = info
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## CACHED"
+        return DisassemblerResult(result.length, text)
+
+    @staticmethod
+    def check():
+        """Check that all of the methods on the cached DisassembleInfo trigger an
+        exception."""
+        info = GlobalCachingDisassembler.cached_insn_disas
+        assert isinstance(info, gdb.disassembler.DisassembleInfo)
+        assert not info.is_valid()
+        try:
+            val = info.address
+            raise gdb.GdbError("DisassembleInfo.address is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError("DisassembleInfo.address raised an unexpected exception")
+
+        try:
+            val = info.architecture
+            raise gdb.GdbError("DisassembleInfo.architecture is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError(
+                "DisassembleInfo.architecture raised an unexpected exception"
+            )
+
+        try:
+            val = info.read_memory(1, 0)
+            raise gdb.GdbError("DisassembleInfo.read is still valid")
+        except RuntimeError as e:
+            assert str(e) == "DisassembleInfo is no longer valid."
+        except:
+            raise gdb.GdbError("DisassembleInfo.read raised an unexpected exception")
+
+        print("PASS")
+
+
+class GlobalNullDisassembler(TestDisassembler):
+    """A disassembler that does not change the output at all."""
+
+    def disassemble(self, info):
+        pass
+
+
+class FaultingMemorySourceDisassembler(TestDisassembler):
+    """Throw a memory error from the memory source read_memory method."""
+
+    def disassemble(self, info):
+        return gdb.disassembler.builtin_disassemble(info, self)
+
+    def read_memory(self, len, offset):
+        # Throw a memory error with a specific address.  We don't
+        # expect this address to show up in the output though.
+        raise gdb.MemoryError(0x1234)
+
+
+class FailingMemorySourceDisassembler(TestDisassembler):
+    """Throw a non-memory error from the memory source read_memory method."""
+
+    def disassemble(self, info):
+        return gdb.disassembler.builtin_disassemble(info, self)
+
+    def read_memory(self, len, offset):
+        raise gdb.GdbError("the memory source failed")
+
+
+class AnalyzingDisassembler(Disassembler):
+    def __init__(self, name):
+        """Constructor."""
+        super(AnalyzingDisassembler, self).__init__(name)
+
+        # Details about the instructions found during the first disassembler
+        # pass.
+        self._pass_1_length = []
+        self._pass_1_insn = []
+        self._pass_1_address = []
+
+        # The start and end address for the instruction we will replace with
+        # one or more 'nop' instructions during pass two.
+        self._start = None
+        self._end = None
+
+        # The index in the _pass_1_* lists for where the nop instruction can
+        # be found, also, the buffer of bytes that make up a nop instruction.
+        self._nop_index = None
+        self._nop_bytes = None
+
+        # The DisassembleInfo object passed into __call__ as INFO.
+        self._info = None
+
+        # A flag that indicates if we are in the first or second pass of
+        # this disassembler test.
+        self._first_pass = True
+
+        # The disassembled instructions collected during the second pass.
+        self._pass_2_insn = []
+
+        # A copy of _pass_1_insn that has been modified to include the extra
+        # 'nop' instructions we plan to insert during the second pass.  This
+        # is then checked against _pass_2_insn after the second disassembler
+        # pass has completed.
+        self._check = []
+
+    def __call__(self, info):
+        """Called to perform the disassembly."""
+
+        # Record INFO, we'll need to refer to this in READ_MEMORY which is
+        # called back to by the builtin disassembler.
+        self._info = info
+        result = gdb.disassembler.builtin_disassemble(info, self)
+
+        # Record some informaiton about the first 'nop' instruction we find.
+        if self._nop_index is None and result.string == "nop":
+            self._nop_index = len(self._pass_1_length)
+            # The offset in the following read_memory call defaults to 0.
+            self._nop_bytes = info.read_memory(result.length)
+
+        # Record information about each instruction that is disassembled.
+        # This test is performed in two passes, and we need different
+        # information in each pass.
+        if self._first_pass:
+            self._pass_1_length.append(result.length)
+            self._pass_1_insn.append(result.string)
+            self._pass_1_address.append(info.address)
+        else:
+            self._pass_2_insn.append(result.string)
+
+        return result
+
+    def _read_replacement(self, length, offset):
+        """Return a slice of the buffer representing the replacement nop
+        instructions."""
+
+        assert self._nop_bytes is not None
+        rb = self._nop_bytes
+
+        # If this request is outside of a nop instruction then we don't know
+        # what to do, so just raise a memory error.
+        if offset >= len(rb) or (offset + length) > len(rb):
+            raise gdb.MemoryError("invalid length and offset combination")
+
+        # Return only the slice of the nop instruction as requested.
+        s = offset
+        e = offset + length
+        return rb[s:e]
+
+    def read_memory(self, len, offset):
+        """Callback used from the builtin disassembler to read the contents of
+        memory."""
+
+        info = self._info
+        assert info is not None
+
+        # If this request is within the region we are replacing with 'nop'
+        # instructions, then call the helper function to perform that
+        # replacement.
+        if self._start is not None:
+            assert self._end is not None
+            if info.address >= self._start and info.address < self._end:
+                return self._read_replacement(len, offset)
+
+        # Otherwise, we just forward this request to the default read memory
+        # implementation.
+        return info.read_memory(len, offset)
+
+    def find_replacement_candidate(self):
+        """Call this after the first disassembly pass.  This identifies a suitable
+        instruction to replace with 'nop' instruction(s)."""
+
+        if self._nop_index is None:
+            raise gdb.GdbError("no nop was found")
+
+        nop_idx = self._nop_index
+        nop_length = self._pass_1_length[nop_idx]
+
+        # First we look for an instruction that is larger than a nop
+        # instruction, but whose length is an exact multiple of the nop
+        # instruction's length.
+        replace_idx = None
+        for idx in range(len(self._pass_1_length)):
+            if (
+                idx > 0
+                and idx != nop_idx
+                and self._pass_1_insn[idx] != "nop"
+                and self._pass_1_length[idx] > self._pass_1_length[nop_idx]
+                and self._pass_1_length[idx] % self._pass_1_length[nop_idx] == 0
+            ):
+                replace_idx = idx
+                break
+
+        # If we still don't have a replacement candidate, then search again,
+        # this time looking for an instruciton that is the same length as a
+        # nop instruction.
+        if replace_idx is None:
+            for idx in range(len(self._pass_1_length)):
+                if (
+                    idx > 0
+                    and idx != nop_idx
+                    and self._pass_1_insn[idx] != "nop"
+                    and self._pass_1_length[idx] == self._pass_1_length[nop_idx]
+                ):
+                    replace_idx = idx
+                    break
+
+        # Weird, the nop instruction must be larger than every other
+        # instruction, or all instructions are 'nop'?
+        if replace_idx is None:
+            raise gdb.GdbError("can't find an instruction to replace")
+
+        # Record the instruction range that will be replaced with 'nop'
+        # instructions, and mark that we are now on the second pass.
+        self._start = self._pass_1_address[replace_idx]
+        self._end = self._pass_1_address[replace_idx] + self._pass_1_length[replace_idx]
+        self._first_pass = False
+        print("Replace from 0x%x to 0x%x with NOP" % (self._start, self._end))
+
+        # Finally, build the expected result.  Create the _check list, which
+        # is a copy of _pass_1_insn, but replace the instruction we
+        # identified above with a series of 'nop' instructions.
+        self._check = list(self._pass_1_insn)
+        nop_count = int(self._pass_1_length[replace_idx] / self._pass_1_length[nop_idx])
+        nops = ["nop"] * nop_count
+        self._check[replace_idx : (replace_idx + 1)] = nops
+
+    def check(self):
+        """Call this after the second disassembler pass to validate the output."""
+        if self._check != self._pass_2_insn:
+            print("APB, Check : %s" % self._check)
+            print("APB, Result: %s" % self._pass_2_insn)
+            raise gdb.GdbError("mismatch")
+        print("PASS")
+
+
+# Create a global instance of the AnalyzingDisassembler.  This isn't
+# registered as a disassembler yet though, that is done from the
+# py-diasm.exp later.
+analyzing_disassembler = AnalyzingDisassembler("AnalyzingDisassembler")
+
+
+def add_global_disassembler(dis_class):
+    """Create an instance of DIS_CLASS and register it as a global disassembler."""
+    dis = dis_class()
+    gdb.disassembler.register_disassembler(dis, None)
+
+
+# Start with all disassemblers removed.
+remove_all_python_disassemblers()
+
+print("Python script imported")
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv4 4/5] gdb: refactor the non-printing disassemblers
  2022-04-25  9:15     ` [PATCHv4 0/5] Add Python API for the disassembler Andrew Burgess
                         ` (2 preceding siblings ...)
  2022-04-25  9:15       ` [PATCHv4 3/5] gdb/python: implement the print_insn extension language hook Andrew Burgess
@ 2022-04-25  9:15       ` Andrew Burgess
  2022-04-25  9:15       ` [PATCHv4 5/5] gdb: unify two dis_asm_read_memory functions in disasm.c Andrew Burgess
                         ` (2 subsequent siblings)
  6 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-04-25  9:15 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

This commit started from an observation I made while working on some
other disassembler patches, that is, that the function
gdb_buffered_insn_length, is broken ... sort of.

I noticed that the gdb_buffered_insn_length function doesn't set up
the application data field if the disassemble_info structure.

Further, I noticed that some architectures, for example, ARM, require
that the application_data field be set, see gdb_print_insn_arm in
arm-tdep.c.

And so, if we ever use gdb_buffered_insn_length for ARM, then GDB will
likely crash.  Which is why I said only "sort of" broken.  Right now
we don't use gdb_buffered_insn_length with ARM, so maybe it isn't
broken yet?

Anyway to prove to myself that there was a problem here I extended the
disassembler self tests in disasm-selftests.c to include a test of
gdb_buffered_insn_length.  As I run the test for all architectures, I
do indeed see GDB crash for ARM.

To fix this we need gdb_buffered_insn_length to create a disassembler
that inherits from gdb_disassemble_info, but we also need this new
disassembler to not print anything.

And so, I introduce a new gdb_non_printing_disassembler class, this is
a disassembler that doesn't print anything to the output stream.

I then observed that both ARC and S12Z also create non-printing
disassemblers, but these are slightly different.  While the
disassembler in gdb_non_printing_disassembler reads the instruction
from a buffer, the ARC and S12Z disassemblers read from target memory
using target_read_code.

And so, I further split gdb_non_printing_disassembler into two
sub-classes, gdb_non_printing_memory_disassembler and
gdb_non_printing_buffer_disassembler.

The new selftests now pass, but otherwise, there should be no user
visible changes after this commit.
---
 gdb/arc-linux-tdep.c   | 15 +++----
 gdb/arc-tdep.c         | 29 +++-----------
 gdb/arc-tdep.h         |  5 ---
 gdb/disasm-selftests.c | 70 ++++++++++++++++++++++++++-------
 gdb/disasm.c           | 88 ++++++++++++++++++------------------------
 gdb/disasm.h           | 56 ++++++++++++++++++++++++---
 gdb/s12z-tdep.c        | 26 +------------
 7 files changed, 158 insertions(+), 131 deletions(-)

diff --git a/gdb/arc-linux-tdep.c b/gdb/arc-linux-tdep.c
index 13595f2e8e9..04ca38f1355 100644
--- a/gdb/arc-linux-tdep.c
+++ b/gdb/arc-linux-tdep.c
@@ -356,7 +356,7 @@ arc_linux_sw_breakpoint_from_kind (struct gdbarch *gdbarch,
    */
 
 static std::vector<CORE_ADDR>
-handle_atomic_sequence (arc_instruction insn, disassemble_info &di)
+handle_atomic_sequence (arc_instruction insn, disassemble_info *di)
 {
   const int atomic_seq_len = 24;    /* Instruction sequence length.  */
   std::vector<CORE_ADDR> next_pcs;
@@ -374,7 +374,7 @@ handle_atomic_sequence (arc_instruction insn, disassemble_info &di)
   for (int insn_count = 0; insn_count < atomic_seq_len; ++insn_count)
     {
       arc_insn_decode (arc_insn_get_linear_next_pc (insn),
-		       &di, arc_delayed_print_insn, &insn);
+		       di, arc_delayed_print_insn, &insn);
 
       if (insn.insn_class == BRCC)
         {
@@ -412,15 +412,15 @@ arc_linux_software_single_step (struct regcache *regcache)
 {
   struct gdbarch *gdbarch = regcache->arch ();
   arc_gdbarch_tdep *tdep = (arc_gdbarch_tdep *) gdbarch_tdep (gdbarch);
-  struct disassemble_info di = arc_disassemble_info (gdbarch);
+  struct gdb_non_printing_memory_disassembler dis (gdbarch);
 
   /* Read current instruction.  */
   struct arc_instruction curr_insn;
-  arc_insn_decode (regcache_read_pc (regcache), &di, arc_delayed_print_insn,
-		   &curr_insn);
+  arc_insn_decode (regcache_read_pc (regcache), dis.disasm_info (),
+		   arc_delayed_print_insn, &curr_insn);
 
   if (curr_insn.insn_class == LLOCK)
-    return handle_atomic_sequence (curr_insn, di);
+    return handle_atomic_sequence (curr_insn, dis.disasm_info ());
 
   CORE_ADDR next_pc = arc_insn_get_linear_next_pc (curr_insn);
   std::vector<CORE_ADDR> next_pcs;
@@ -431,7 +431,8 @@ arc_linux_software_single_step (struct regcache *regcache)
   if (curr_insn.has_delay_slot)
     {
       struct arc_instruction next_insn;
-      arc_insn_decode (next_pc, &di, arc_delayed_print_insn, &next_insn);
+      arc_insn_decode (next_pc, dis.disasm_info (), arc_delayed_print_insn,
+		       &next_insn);
       next_pcs.push_back (arc_insn_get_linear_next_pc (next_insn));
     }
   else
diff --git a/gdb/arc-tdep.c b/gdb/arc-tdep.c
index 98bd1c4bc0a..75fd3077ca7 100644
--- a/gdb/arc-tdep.c
+++ b/gdb/arc-tdep.c
@@ -1306,24 +1306,6 @@ arc_is_in_prologue (struct gdbarch *gdbarch, const struct arc_instruction &insn,
   return false;
 }
 
-/* See arc-tdep.h.  */
-
-struct disassemble_info
-arc_disassemble_info (struct gdbarch *gdbarch)
-{
-  struct disassemble_info di;
-  init_disassemble_info_for_no_printing (&di);
-  di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
-  di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
-  di.endian = gdbarch_byte_order (gdbarch);
-  di.read_memory_func = [](bfd_vma memaddr, gdb_byte *myaddr,
-			   unsigned int len, struct disassemble_info *info)
-    {
-      return target_read_code (memaddr, myaddr, len);
-    };
-  return di;
-}
-
 /* Analyze the prologue and update the corresponding frame cache for the frame
    unwinder for unwinding frames that doesn't have debug info.  In such
    situation GDB attempts to parse instructions in the prologue to understand
@@ -1394,9 +1376,10 @@ arc_analyze_prologue (struct gdbarch *gdbarch, const CORE_ADDR entrypoint,
   while (current_prologue_end < limit_pc)
     {
       struct arc_instruction insn;
-      struct disassemble_info di = arc_disassemble_info (gdbarch);
-      arc_insn_decode (current_prologue_end, &di, arc_delayed_print_insn,
-		       &insn);
+
+      struct gdb_non_printing_memory_disassembler dis (gdbarch);
+      arc_insn_decode (current_prologue_end, dis.disasm_info (),
+		       arc_delayed_print_insn, &insn);
 
       if (arc_debug)
 	arc_insn_dump (insn);
@@ -2460,8 +2443,8 @@ dump_arc_instruction_command (const char *args, int from_tty)
 
   CORE_ADDR address = value_as_address (val);
   struct arc_instruction insn;
-  struct disassemble_info di = arc_disassemble_info (target_gdbarch ());
-  arc_insn_decode (address, &di, arc_delayed_print_insn, &insn);
+  struct gdb_non_printing_memory_disassembler dis (target_gdbarch ());
+  arc_insn_decode (address, dis.disasm_info (), arc_delayed_print_insn, &insn);
   arc_insn_dump (insn);
 }
 
diff --git a/gdb/arc-tdep.h b/gdb/arc-tdep.h
index ceca003204f..53e5d8476fc 100644
--- a/gdb/arc-tdep.h
+++ b/gdb/arc-tdep.h
@@ -186,11 +186,6 @@ arc_arch_is_em (const struct bfd_arch_info* arch)
    can't be set to an actual NULL value - that would cause a crash.  */
 int arc_delayed_print_insn (bfd_vma addr, struct disassemble_info *info);
 
-/* Return properly initialized disassemble_info for ARC disassembler - it will
-   not print disassembled instructions to stderr.  */
-
-struct disassemble_info arc_disassemble_info (struct gdbarch *gdbarch);
-
 /* Get branch/jump target address for the INSN.  Note that this function
    returns branch target and doesn't evaluate if this branch is taken or not.
    For the indirect jumps value depends in register state, hence can change.
diff --git a/gdb/disasm-selftests.c b/gdb/disasm-selftests.c
index 928d26f7018..07586f04abd 100644
--- a/gdb/disasm-selftests.c
+++ b/gdb/disasm-selftests.c
@@ -25,13 +25,19 @@
 
 namespace selftests {
 
-/* Test disassembly of one instruction.  */
+/* Return a pointer to a buffer containing an instruction that can be
+   disassembled for architecture GDBARCH.  *LEN will be set to the length
+   of the returned buffer.
 
-static void
-print_one_insn_test (struct gdbarch *gdbarch)
+   If there's no known instruction to disassemble for GDBARCH (because we
+   haven't figured on out, not because no instructions exist) then nullptr
+   is returned, and *LEN is set to 0.  */
+
+static const gdb_byte *
+get_test_insn (struct gdbarch *gdbarch, size_t *len)
 {
-  size_t len = 0;
-  const gdb_byte *insn = NULL;
+  *len = 0;
+  const gdb_byte *insn = nullptr;
 
   switch (gdbarch_bfd_arch_info (gdbarch)->arch)
     {
@@ -40,34 +46,34 @@ print_one_insn_test (struct gdbarch *gdbarch)
       static const gdb_byte bfin_insn[] = {0x17, 0xe1, 0xff, 0xff};
 
       insn = bfin_insn;
-      len = sizeof (bfin_insn);
+      *len = sizeof (bfin_insn);
       break;
     case bfd_arch_arm:
       /* mov     r0, #0 */
       static const gdb_byte arm_insn[] = {0x0, 0x0, 0xa0, 0xe3};
 
       insn = arm_insn;
-      len = sizeof (arm_insn);
+      *len = sizeof (arm_insn);
       break;
     case bfd_arch_ia64:
     case bfd_arch_mep:
     case bfd_arch_mips:
     case bfd_arch_tic6x:
     case bfd_arch_xtensa:
-      return;
+      return insn;
     case bfd_arch_s390:
       /* nopr %r7 */
       static const gdb_byte s390_insn[] = {0x07, 0x07};
 
       insn = s390_insn;
-      len = sizeof (s390_insn);
+      *len = sizeof (s390_insn);
       break;
     case bfd_arch_xstormy16:
       /* nop */
       static const gdb_byte xstormy16_insn[] = {0x0, 0x0};
 
       insn = xstormy16_insn;
-      len = sizeof (xstormy16_insn);
+      *len = sizeof (xstormy16_insn);
       break;
     case bfd_arch_nios2:
     case bfd_arch_score:
@@ -78,13 +84,13 @@ print_one_insn_test (struct gdbarch *gdbarch)
       {
 	int bplen;
 	insn = gdbarch_sw_breakpoint_from_kind (gdbarch, 4, &bplen);
-	len = bplen;
+	*len = bplen;
       }
       break;
     case bfd_arch_arc:
       /* PR 21003 */
       if (gdbarch_bfd_arch_info (gdbarch)->mach == bfd_mach_arc_arc601)
-	return;
+	return insn;
       goto generic_case;
     case bfd_arch_i386:
       {
@@ -93,7 +99,7 @@ print_one_insn_test (struct gdbarch *gdbarch)
 	   opcodes rejects an attempt to disassemble for an arch with
 	   a 64-bit address size when bfd_vma is 32-bit.  */
 	if (info->bits_per_address > sizeof (bfd_vma) * CHAR_BIT)
-	  return;
+	  return insn;
       }
       /* fall through */
     default:
@@ -105,12 +111,26 @@ print_one_insn_test (struct gdbarch *gdbarch)
 	int bplen;
 
 	insn = gdbarch_sw_breakpoint_from_kind (gdbarch, kind, &bplen);
-	len = bplen;
+	*len = bplen;
 
 	break;
       }
     }
-  SELF_CHECK (len > 0);
+  SELF_CHECK (*len > 0);
+
+  return insn;
+}
+
+/* Test disassembly of one instruction.  */
+
+static void
+print_one_insn_test (struct gdbarch *gdbarch)
+{
+  size_t len;
+  const gdb_byte *insn = get_test_insn (gdbarch, &len);
+
+  if (insn == nullptr)
+    return;
 
   /* Test gdb_disassembler for a given gdbarch by reading data from a
      pre-allocated buffer.  If you want to see the disassembled
@@ -175,6 +195,24 @@ print_one_insn_test (struct gdbarch *gdbarch)
   SELF_CHECK (di.print_insn (0) == len);
 }
 
+/* Test the gdb_buffered_insn_length function.  */
+
+static void
+buffered_insn_length_test (struct gdbarch *gdbarch)
+{
+  size_t buf_len;
+  const gdb_byte *insn = get_test_insn (gdbarch, &buf_len);
+
+  if (insn == nullptr)
+    return;
+
+  CORE_ADDR insn_address = 0;
+  int calculated_len = gdb_buffered_insn_length (gdbarch, insn, buf_len,
+						 insn_address);
+
+  SELF_CHECK (calculated_len == buf_len);
+}
+
 /* Test disassembly on memory error.  */
 
 static void
@@ -235,4 +273,6 @@ _initialize_disasm_selftests ()
 					 selftests::print_one_insn_test);
   selftests::register_test_foreach_arch ("memory_error",
 					 selftests::memory_error_test);
+  selftests::register_test_foreach_arch ("buffered_insn_length",
+					 selftests::buffered_insn_length_test);
 }
diff --git a/gdb/disasm.c b/gdb/disasm.c
index 16e3c39b702..1dfa141b10b 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -996,66 +996,56 @@ gdb_insn_length (struct gdbarch *gdbarch, CORE_ADDR addr)
   return gdb_print_insn (gdbarch, addr, &null_stream, NULL);
 }
 
-/* An fprintf-function for use by the disassembler when we know we don't
-   want to print anything.  Always returns success.  */
+/* See disasm.h.  */
 
-static int ATTRIBUTE_PRINTF (2, 3)
-gdb_disasm_null_printf (void *stream, const char *format, ...)
+int
+gdb_non_printing_disassembler::null_fprintf_func (void *stream,
+						  const char *format, ...)
 {
   return 0;
 }
 
-/* An fprintf-function for use by the disassembler when we know we don't
-   want to print anything, and the disassembler is using style.  Always
-   returns success.  */
+/* See disasm.h.  */
 
-static int ATTRIBUTE_PRINTF (3, 4)
-gdb_disasm_null_styled_printf (void *stream,
-			       enum disassembler_style style,
-			       const char *format, ...)
+int
+gdb_non_printing_disassembler::null_fprintf_styled_func
+  (void *stream, enum disassembler_style style, const char *format, ...)
 {
   return 0;
 }
 
 /* See disasm.h.  */
 
-void
-init_disassemble_info_for_no_printing (struct disassemble_info *dinfo)
+int
+gdb_non_printing_memory_disassembler::dis_asm_read_memory
+  (bfd_vma memaddr, bfd_byte *myaddr, unsigned int length,
+   struct disassemble_info *dinfo)
 {
-  init_disassemble_info (dinfo, nullptr, gdb_disasm_null_printf,
-			 gdb_disasm_null_styled_printf);
+  return target_read_code (memaddr, myaddr, length);
 }
 
-/* Initialize a struct disassemble_info for gdb_buffered_insn_length.
-   Upon return, *DISASSEMBLER_OPTIONS_HOLDER owns the string pointed
-   to by DI.DISASSEMBLER_OPTIONS.  */
+/* A non-printing disassemble_info management class.  The disassemble_info
+   setup by this class will not print anything to the output stream (there
+   is no output stream), and the instruction to be disassembled will be
+   read from a buffer passed to the constructor.  */
 
-static void
-gdb_buffered_insn_length_init_dis (struct gdbarch *gdbarch,
-				   struct disassemble_info *di,
-				   const gdb_byte *insn, int max_len,
-				   CORE_ADDR addr,
-				   std::string *disassembler_options_holder)
+struct gdb_non_printing_buffer_disassembler
+  : public gdb_non_printing_disassembler
 {
-  init_disassemble_info_for_no_printing (di);
-
-  /* init_disassemble_info installs buffer_read_memory, etc.
-     so we don't need to do that here.
-     The cast is necessary until disassemble_info is const-ified.  */
-  di->buffer = (gdb_byte *) insn;
-  di->buffer_length = max_len;
-  di->buffer_vma = addr;
-
-  di->arch = gdbarch_bfd_arch_info (gdbarch)->arch;
-  di->mach = gdbarch_bfd_arch_info (gdbarch)->mach;
-  di->endian = gdbarch_byte_order (gdbarch);
-  di->endian_code = gdbarch_byte_order_for_code (gdbarch);
-
-  *disassembler_options_holder = get_all_disassembler_options (gdbarch);
-  if (!disassembler_options_holder->empty ())
-    di->disassembler_options = disassembler_options_holder->c_str ();
-  disassemble_init_for_target (di);
-}
+  /* Constructor.  GDBARCH is the architecture to disassemble for, BUFFER
+     contains the instruction to disassemble, and INSN_ADDRESS is the
+     address (in target memory) of the instruction to disassemble.  */
+  gdb_non_printing_buffer_disassembler (struct gdbarch *gdbarch,
+					gdb::array_view<const gdb_byte> buffer,
+					CORE_ADDR insn_address)
+    : gdb_non_printing_disassembler (gdbarch, nullptr)
+  {
+    /* The cast is necessary until disassemble_info is const-ified.  */
+    m_di.buffer = (gdb_byte *) buffer.data ();
+    m_di.buffer_length = buffer.size ();
+    m_di.buffer_vma = insn_address;
+  }
+};
 
 /* Return the length in bytes of INSN.  MAX_LEN is the size of the
    buffer containing INSN.  */
@@ -1064,14 +1054,10 @@ int
 gdb_buffered_insn_length (struct gdbarch *gdbarch,
 			  const gdb_byte *insn, int max_len, CORE_ADDR addr)
 {
-  struct disassemble_info di;
-  std::string disassembler_options_holder;
-
-  gdb_buffered_insn_length_init_dis (gdbarch, &di, insn, max_len, addr,
-				     &disassembler_options_holder);
-
-  int result = gdb_print_insn_1 (gdbarch, addr, &di);
-  disassemble_free_target (&di);
+  gdb::array_view<const gdb_byte> buffer
+    = gdb::make_array_view (insn, max_len);
+  gdb_non_printing_buffer_disassembler dis (gdbarch, buffer, addr);
+  int result = gdb_print_insn_1 (gdbarch, addr, dis.disasm_info ());
   return result;
 }
 
diff --git a/gdb/disasm.h b/gdb/disasm.h
index b3e40e2981e..6c1d7673b01 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -136,6 +136,56 @@ struct gdb_printing_disassembler : public gdb_disassemble_info
     ATTRIBUTE_PRINTF(3,4);
 };
 
+/* A basic disassembler that doesn't actually print anything.  */
+
+struct gdb_non_printing_disassembler : public gdb_disassemble_info
+{
+  gdb_non_printing_disassembler (struct gdbarch *gdbarch,
+				 read_memory_ftype read_memory_func)
+    : gdb_disassemble_info (gdbarch, nullptr /* stream */,
+			    read_memory_func,
+			    nullptr /* memory_error_func */,
+			    nullptr /* print_address_func */,
+			    null_fprintf_func,
+			    null_fprintf_styled_func)
+  { /* Nothing.  */ }
+
+private:
+
+  /* Callback used as the disassemble_info's fprintf_func callback, this
+     doesn't write anything to STREAM, but just returns 0.  */
+  static int null_fprintf_func (void *stream, const char *format, ...)
+    ATTRIBUTE_PRINTF(2,3);
+
+  /* Callback used as the disassemble_info's fprintf_styled_func callback,
+     , this doesn't write anything to STREAM, but just returns 0.  */
+  static int null_fprintf_styled_func (void *stream,
+				       enum disassembler_style style,
+				       const char *format, ...)
+    ATTRIBUTE_PRINTF(3,4);
+};
+
+/* A non-printing disassemble_info management class.  The disassemble_info
+   setup by this class will not print anything to the output stream (there
+   is no output stream), and the instruction to be disassembled will be
+   read from target memory.  */
+
+struct gdb_non_printing_memory_disassembler
+  : public gdb_non_printing_disassembler
+{
+  /* Constructor.  GDBARCH is the architecture to disassemble for.  */
+  gdb_non_printing_memory_disassembler (struct gdbarch *gdbarch)
+    :gdb_non_printing_disassembler (gdbarch, dis_asm_read_memory)
+  { /* Nothing.  */ }
+
+private:
+
+  /* Implements the read_memory_func disassemble_info callback.  */
+  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
+				  unsigned int len,
+				  struct disassemble_info *info);
+};
+
 /* A dissassembler class that provides 'print_insn', a method for
    disassembling a single instruction to the output stream.  */
 
@@ -278,10 +328,4 @@ extern char *get_disassembler_options (struct gdbarch *gdbarch);
 
 extern void set_disassembler_options (const char *options);
 
-/* Setup DINFO with its output function and output stream setup so that
-   nothing is printed while disassembling.  */
-
-extern void init_disassemble_info_for_no_printing
-  (struct disassemble_info *dinfo);
-
 #endif
diff --git a/gdb/s12z-tdep.c b/gdb/s12z-tdep.c
index 5394c1bbf5e..4e33faaea9a 100644
--- a/gdb/s12z-tdep.c
+++ b/gdb/s12z-tdep.c
@@ -141,27 +141,6 @@ s12z_dwarf_reg_to_regnum (struct gdbarch *gdbarch, int num)
 
 /* Support functions for frame handling.  */
 
-
-/* Return a disassemble_info initialized for s12z disassembly, however,
-   the disassembler will not actually print anything.  */
-
-static struct disassemble_info
-s12z_disassemble_info (struct gdbarch *gdbarch)
-{
-  struct disassemble_info di;
-  init_disassemble_info_for_no_printing (&di);
-  di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
-  di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
-  di.endian = gdbarch_byte_order (gdbarch);
-  di.read_memory_func = [](bfd_vma memaddr, gdb_byte *myaddr,
-			   unsigned int len, struct disassemble_info *info)
-    {
-      return target_read_code (memaddr, myaddr, len);
-    };
-  return di;
-}
-
-
 /* A struct (based on mem_read_abstraction_base) to read memory
    through the disassemble_info API.  */
 struct mem_read_abstraction
@@ -332,15 +311,14 @@ s12z_frame_cache (struct frame_info *this_frame, void **prologue_cache)
   int frame_size = 0;
   int saved_frame_size = 0;
 
-  struct disassemble_info di = s12z_disassemble_info (gdbarch);
-
+  struct gdb_non_printing_memory_disassembler dis (gdbarch);
 
   struct mem_read_abstraction mra;
   mra.base.read = (int (*)(mem_read_abstraction_base*,
 			   int, size_t, bfd_byte*)) abstract_read_memory;
   mra.base.advance = advance ;
   mra.base.posn = posn;
-  mra.info = &di;
+  mra.info = dis.disasm_info ();
 
   while (this_pc > addr)
     {
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv4 5/5] gdb: unify two dis_asm_read_memory functions in disasm.c
  2022-04-25  9:15     ` [PATCHv4 0/5] Add Python API for the disassembler Andrew Burgess
                         ` (3 preceding siblings ...)
  2022-04-25  9:15       ` [PATCHv4 4/5] gdb: refactor the non-printing disassemblers Andrew Burgess
@ 2022-04-25  9:15       ` Andrew Burgess
  2022-05-03 10:12       ` [PATCHv4 0/5] Add Python API for the disassembler Andrew Burgess
  2022-05-06 17:17       ` [PATCHv5 " Andrew Burgess
  6 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-04-25  9:15 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

After the recent restructuring of the disassembler code, GDB has ended
up with two identical class static functions, both called
dis_asm_read_memory, with identical implementations.

My first thought was to move these out of their respective classes,
and just make them global functions, then I'd only need a single
copy.

And maybe that's the right way to go.  But I disliked that by doing
that I loose the encapsulation of the method with the corresponding
disassembler class.

So, instead, I placed the static method into its own class, and had
both the gdb_non_printing_memory_disassembler and gdb_disassembler
classes inherit from this new class as an additional base-class.

In terms of code generated, I don't think there's any significant
difference with this approach, but I think this better reflects how
the function is closely tied to the disassembler.

There should be no user visible changes after this commit.
---
 gdb/disasm.c | 16 +++-------------
 gdb/disasm.h | 29 +++++++++++++++++------------
 2 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/gdb/disasm.c b/gdb/disasm.c
index 1dfa141b10b..563cef8b845 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -132,9 +132,9 @@ line_has_code_p (htab_t table, struct symtab *symtab, int line)
 /* Wrapper of target_read_code.  */
 
 int
-gdb_disassembler::dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
-				       unsigned int len,
-				       struct disassemble_info *info)
+gdb_disassembler_memory_reader::dis_asm_read_memory
+  (bfd_vma memaddr, gdb_byte *myaddr, unsigned int len,
+   struct disassemble_info *info)
 {
   return target_read_code (memaddr, myaddr, len);
 }
@@ -1014,16 +1014,6 @@ gdb_non_printing_disassembler::null_fprintf_styled_func
   return 0;
 }
 
-/* See disasm.h.  */
-
-int
-gdb_non_printing_memory_disassembler::dis_asm_read_memory
-  (bfd_vma memaddr, bfd_byte *myaddr, unsigned int length,
-   struct disassemble_info *dinfo)
-{
-  return target_read_code (memaddr, myaddr, length);
-}
-
 /* A non-printing disassemble_info management class.  The disassemble_info
    setup by this class will not print anything to the output stream (there
    is no output stream), and the instruction to be disassembled will be
diff --git a/gdb/disasm.h b/gdb/disasm.h
index 6c1d7673b01..5d1112cf0d6 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -165,31 +165,39 @@ struct gdb_non_printing_disassembler : public gdb_disassemble_info
     ATTRIBUTE_PRINTF(3,4);
 };
 
+/* This is a helper class, for use as an additional base-class, by some of
+   the disassembler classes below.  This class just defines a static method
+   for reading from target memory, which can then be used by the various
+   disassembler sub-classes.  */
+
+struct gdb_disassembler_memory_reader
+{
+  /* Implements the read_memory_func disassemble_info callback.  */
+  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
+				  unsigned int len,
+				  struct disassemble_info *info);
+};
+
 /* A non-printing disassemble_info management class.  The disassemble_info
    setup by this class will not print anything to the output stream (there
    is no output stream), and the instruction to be disassembled will be
    read from target memory.  */
 
 struct gdb_non_printing_memory_disassembler
-  : public gdb_non_printing_disassembler
+  : public gdb_non_printing_disassembler,
+    private gdb_disassembler_memory_reader
 {
   /* Constructor.  GDBARCH is the architecture to disassemble for.  */
   gdb_non_printing_memory_disassembler (struct gdbarch *gdbarch)
     :gdb_non_printing_disassembler (gdbarch, dis_asm_read_memory)
   { /* Nothing.  */ }
-
-private:
-
-  /* Implements the read_memory_func disassemble_info callback.  */
-  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
-				  unsigned int len,
-				  struct disassemble_info *info);
 };
 
 /* A dissassembler class that provides 'print_insn', a method for
    disassembling a single instruction to the output stream.  */
 
-struct gdb_disassembler : public gdb_printing_disassembler
+struct gdb_disassembler : public gdb_printing_disassembler,
+			  private gdb_disassembler_memory_reader
 {
   gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
     : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
@@ -239,9 +247,6 @@ struct gdb_disassembler : public gdb_printing_disassembler
      (currently just to addresses and symbols) as it goes.  */
   static bool use_ext_lang_colorization_p;
 
-  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
-				  unsigned int len,
-				  struct disassemble_info *info);
   static void dis_asm_memory_error (int err, bfd_vma memaddr,
 				    struct disassemble_info *info);
   static void dis_asm_print_address (bfd_vma addr,
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv4 3/5] gdb/python: implement the print_insn extension language hook
  2022-04-25  9:15       ` [PATCHv4 3/5] gdb/python: implement the print_insn extension language hook Andrew Burgess
@ 2022-04-25 11:26         ` Eli Zaretskii
  2022-05-03 14:55         ` Simon Marchi
  1 sibling, 0 replies; 80+ messages in thread
From: Eli Zaretskii @ 2022-04-25 11:26 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches, andrew.burgess

> Date: Mon, 25 Apr 2022 10:15:39 +0100
> From: Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org>
> Cc: Andrew Burgess <andrew.burgess@embecosm.com>
> 
>  gdb/Makefile.in                        |   1 +
>  gdb/NEWS                               |  34 +
>  gdb/data-directory/Makefile.in         |   1 +
>  gdb/doc/python.texi                    | 247 +++++++
>  gdb/python/lib/gdb/disassembler.py     | 109 +++
>  gdb/python/py-disasm.c                 | 970 +++++++++++++++++++++++++
>  gdb/python/python-internal.h           |  16 +
>  gdb/python/python.c                    |   3 +-
>  gdb/testsuite/gdb.python/py-disasm.c   |  25 +
>  gdb/testsuite/gdb.python/py-disasm.exp | 150 ++++
>  gdb/testsuite/gdb.python/py-disasm.py  | 456 ++++++++++++
>  11 files changed, 2011 insertions(+), 1 deletion(-)
>  create mode 100644 gdb/python/lib/gdb/disassembler.py
>  create mode 100644 gdb/python/py-disasm.c
>  create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
>  create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
>  create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

Thanks, the documentation parts are OK.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv4 0/5] Add Python API for the disassembler
  2022-04-25  9:15     ` [PATCHv4 0/5] Add Python API for the disassembler Andrew Burgess
                         ` (4 preceding siblings ...)
  2022-04-25  9:15       ` [PATCHv4 5/5] gdb: unify two dis_asm_read_memory functions in disasm.c Andrew Burgess
@ 2022-05-03 10:12       ` Andrew Burgess
  2022-05-06 17:17       ` [PATCHv5 " Andrew Burgess
  6 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-05-03 10:12 UTC (permalink / raw)
  To: gdb-patches


Ping!

I'd like to see if I can get this work moved forward.

I'd probably just push this work on the grounds we could back it out if
anyone complains after the fact ... except for patch #3, which extends
the Python API, and I'd prefer at least a little review before I merge
this, even if it was just reading the documentation to check the new API
makes sense.

All thoughts welcome,

Thanks,
Andrew



Andrew Burgess <aburgess@redhat.com> writes:

> Changes in v4:
>
>   - Patch #1 from v3 series has been merged,
>
>   - Addressed Eli's feedback on previous series,
>
>   - Rebased onto current upstream/master.
>
> Changes in v3:
>
>   - Rebased to current master, and retested,
>
>   - Patch #1 is new in this series,
>
>   - Patch #2 is changed slightly from v2, I've reworked the
>     disassembler classes in a slightly different way now, in order to
>     prepare for patches #5 and #6.
>
>   - Patch #3 is unchanged from v2,
>
>   - Patch #4 is unchanged from v2,
>
>   - Patch #5 is new in v3.  I've included it here as the changes in #2
>     only make sense knowing that patch #5 is coming,
>
>   - Patch #6 is a small cleanup only possible after #2 and #5 have landed.
>
> Changes in v2:
>
>   - The first 3 patches from the v1 series were merged a while back,
>     these were all refactoring, or auxiliary features,
>
>   - There's a new #1 patch in the v2 series that does some new
>     refactoring of GDB's disassembler classes, this was required in
>     order to simplify the #3 patch,
>
>   - Patch #2 in the v2 series is largely unchanged from patch #4 in
>     the v1 series,
>
>   - The syntax highlighting work that was in the v1 series was spun
>     out into its own patch, and has been merged separately,
>
>   - The format_address helper function that appeared in the v1 series,
>     and that Simon suggested I make more general, was spun out into
>     its own patch, and merged separately,
>
>   - Finally, patch #3 in the v2 series is pretty much a complete
>     rewrite from the v1 series in order to follow the approach
>     suggested by Simon.  Results are now returned directly, either via
>     'return' or by raising an exception, in contrast to the original
>     approach which involved "setting" the result into an existing
>     state object.
>
> ---
>
> Andrew Burgess (5):
>   gdb: add new base class to gdb_disassembler
>   gdb: add extension language print_insn hook
>   gdb/python: implement the print_insn extension language hook
>   gdb: refactor the non-printing disassemblers
>   gdb: unify two dis_asm_read_memory functions in disasm.c
>
>  gdb/Makefile.in                        |   1 +
>  gdb/NEWS                               |  34 +
>  gdb/arc-linux-tdep.c                   |  15 +-
>  gdb/arc-tdep.c                         |  29 +-
>  gdb/arc-tdep.h                         |   5 -
>  gdb/arm-tdep.c                         |   4 +-
>  gdb/data-directory/Makefile.in         |   1 +
>  gdb/disasm-selftests.c                 |  70 +-
>  gdb/disasm.c                           | 172 ++---
>  gdb/disasm.h                           | 207 +++++-
>  gdb/doc/python.texi                    | 247 +++++++
>  gdb/extension-priv.h                   |  15 +
>  gdb/extension.c                        |  20 +
>  gdb/extension.h                        |  17 +
>  gdb/guile/guile.c                      |   6 +-
>  gdb/mips-tdep.c                        |   4 +-
>  gdb/python/lib/gdb/disassembler.py     | 109 +++
>  gdb/python/py-disasm.c                 | 970 +++++++++++++++++++++++++
>  gdb/python/python-internal.h           |  16 +
>  gdb/python/python.c                    |   3 +
>  gdb/s12z-tdep.c                        |  26 +-
>  gdb/testsuite/gdb.python/py-disasm.c   |  25 +
>  gdb/testsuite/gdb.python/py-disasm.exp | 150 ++++
>  gdb/testsuite/gdb.python/py-disasm.py  | 456 ++++++++++++
>  24 files changed, 2405 insertions(+), 197 deletions(-)
>  create mode 100644 gdb/python/lib/gdb/disassembler.py
>  create mode 100644 gdb/python/py-disasm.c
>  create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
>  create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
>  create mode 100644 gdb/testsuite/gdb.python/py-disasm.py
>
> -- 
> 2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv4 1/5] gdb: add new base class to gdb_disassembler
  2022-04-25  9:15       ` [PATCHv4 1/5] gdb: add new base class to gdb_disassembler Andrew Burgess
@ 2022-05-03 13:34         ` Simon Marchi
  2022-05-03 16:13           ` Andrew Burgess
  2022-05-05 17:39           ` Andrew Burgess
  0 siblings, 2 replies; 80+ messages in thread
From: Simon Marchi @ 2022-05-03 13:34 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches; +Cc: Andrew Burgess

> From: Andrew Burgess <andrew.burgess@embecosm.com>
> 
> The motivation for this change is an upcoming Python disassembler API
> that I would like to add.  As part of that change I need to create a
> new disassembler like class that contains a disassemble_info and a
> gdbarch.  The management of these two objects is identical to how we
> manage these objects within gdb_disassembler, so it might be tempting
> for my new class to inherit from gdb_disassembler.
> 
> The problem however, is that gdb_disassembler has a tight connection
> between its constructor, and its print_insn method.  In the
> constructor the ui_file* that is passed in is replaced with a member
> variable string_file*, and then in print_insn, the contents of the
> member variable string_file are printed to the original ui_file*.
> 
> What this means is that the gdb_disassembler class has a tight
> coupling between its constructor and print_insn; the class just isn't
> intended to be used in a situation where print_insn is not going to be
> called, which is how my (upcoming) sub-class would need to operate.
> 
> My solution then, is to separate out the management of the
> disassemble_info and gdbarch into a new gdb_disassemble_info class,
> and make this class a parent of gdb_disassembler.
> 
> In arm-tdep.c and mips-tdep.c, where we used to cast the
> disassemble_info->application_data to a gdb_disassembler, we can now
> cast to a gdb_disassemble_info as we only need to access the gdbarch
> information.
> 
> Now, my new Python disassembler sub-class will still want to print
> things to an output stream, and so we will want access to the
> dis_asm_fprintf functionality for printing.
> 
> However, rather than move this printing code into the
> gdb_disassemble_info base class, I have added yet another level of
> hierarchy, a gdb_printing_disassembler, thus the class structure is
> now:
> 
>   struct gdb_disassemble_info {};
>   struct gdb_printing_disassembler : public gdb_disassemble_info {};
>   struct gdb_disassembler : public gdb_printing_disassembler {};

I can't explain it very well, but it seems a little strange to make the
other classes inherit from gdb_disassemble_info.  From the name (and the
code), I understand that gdb_disassemble_info purely holds the data
necessary for the disassemblers to do their job.  So this is not really
an IS-A relationship, but more a HAS-A.  So I would think composition
would be more natural (i.e. gdb_printing_disassembler would have a field
of type gdb_disassemble_info).

But with the callbacks we pass to struct disassemble_info (such as
fprintf_func and fprintf_styled_func), what is data and what is behavior
gets a little blurry.

It doesn't matter much in the end, I'm fine with what you have.  And it
can always be refactored later.

> +gdb_disassemble_info::gdb_disassemble_info
> +  (struct gdbarch *gdbarch, struct ui_file *stream,
> +   read_memory_ftype read_memory_func, memory_error_ftype memory_error_func,
> +   print_address_ftype print_address_func, fprintf_ftype fprintf_func,
> +   fprintf_styled_ftype fprintf_styled_func)
> +    : m_gdbarch (gdbarch)
>  {
> -  init_disassemble_info (&m_di, &m_buffer, dis_asm_fprintf,
> -			 dis_asm_styled_fprintf);
> +  gdb_assert (fprintf_func != nullptr);
> +  gdb_assert (fprintf_styled_func != nullptr);
> +  init_disassemble_info (&m_di, stream, fprintf_func,
> +			 fprintf_styled_func);
>    m_di.flavour = bfd_target_unknown_flavour;
> -  m_di.memory_error_func = dis_asm_memory_error;
> -  m_di.print_address_func = dis_asm_print_address;
> -  /* NOTE: cagney/2003-04-28: The original code, from the old Insight
> -     disassembler had a local optimization here.  By default it would
> -     access the executable file, instead of the target memory (there
> -     was a growing list of exceptions though).  Unfortunately, the
> -     heuristic was flawed.  Commands like "disassemble &variable"
> -     didn't work as they relied on the access going to the target.
> -     Further, it has been superseeded by trust-read-only-sections
> -     (although that should be superseeded by target_trust..._p()).  */
> -  m_di.read_memory_func = read_memory_func;
> +  if (memory_error_func != nullptr)
> +    m_di.memory_error_func = memory_error_func;
> +  if (print_address_func != nullptr)
> +    m_di.print_address_func = print_address_func;
> +  if (read_memory_func != nullptr)
> +    m_di.read_memory_func = read_memory_func;

Are the nullptr checks needed?  Are these fields initially nullptr, or
they have some other value?


> +struct gdb_disassemble_info
> +{
> +  DISABLE_COPY_AND_ASSIGN (gdb_disassemble_info);
>  
> -  /* Return the gdbarch of gdb_disassembler.  */
> +  /* Return the gdbarch we are disassembing for.  */

disassembing -> disassembling

Simon

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv4 2/5] gdb: add extension language print_insn hook
  2022-04-25  9:15       ` [PATCHv4 2/5] gdb: add extension language print_insn hook Andrew Burgess
@ 2022-05-03 13:42         ` Simon Marchi
  0 siblings, 0 replies; 80+ messages in thread
From: Simon Marchi @ 2022-05-03 13:42 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches; +Cc: Andrew Burgess

> diff --git a/gdb/extension-priv.h b/gdb/extension-priv.h
> index d9450b51231..7c74e721c57 100644
> --- a/gdb/extension-priv.h
> +++ b/gdb/extension-priv.h
> @@ -263,6 +263,21 @@ struct extension_language_ops
>       contents, or an empty optional.  */
>    gdb::optional<std::string> (*colorize_disasm) (const std::string &content,
>  						 gdbarch *gdbarch);
> +
> +  /* Print a single instruction from ADDRESS in architecture GDBARCH.  INFO
> +     is the standard libopcodes disassembler_info structure.  Bytes for the
> +     instruction being printed should be read using INFO->read_memory_func
> +     as the actual instruction bytes might be in a buffer.
> +
> +     Use INFO->fprintf_func to print the results of the disassembly, and
> +     return the length of the instruction.
> +
> +     If no instruction can be disassembled then return an empty value and
> +     other extension languages will get a chance to perform the
> +     disassembly.  */
> +  gdb::optional<int> (*print_insn) (struct gdbarch *gdbarch,
> +				    CORE_ADDR address,
> +				    struct disassemble_info *info);

The doc for this and ext_lang_print_insn are very similar, and risk
getting out of sync easily.  Perhaps one can reference the other?

>  };
>  
>  /* State necessary to restore a signal handler to its previous value.  */
> diff --git a/gdb/extension.c b/gdb/extension.c
> index 8f39b86e952..62f41c6445d 100644
> --- a/gdb/extension.c
> +++ b/gdb/extension.c
> @@ -924,6 +924,26 @@ ext_lang_colorize_disasm (const std::string &content, gdbarch *gdbarch)
>    return result;
>  }
>  
> +/* See extension.h.  */
> +
> +gdb::optional<int>
> +ext_lang_print_insn (struct gdbarch *gdbarch, CORE_ADDR address,
> +		     struct disassemble_info *info)
> +{
> +  for (const struct extension_language_defn *extlang : extension_languages)
> +    {
> +      if (extlang->ops == nullptr
> +	  || extlang->ops->print_insn == nullptr)
> +	continue;
> +      gdb::optional<int> length
> +	(extlang->ops->print_insn (gdbarch, address, info));

We usually prefer operator=, just as you would use if the result was an
int:

  gdb::optional<int> length
    = ...

Simon

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv4 3/5] gdb/python: implement the print_insn extension language hook
  2022-04-25  9:15       ` [PATCHv4 3/5] gdb/python: implement the print_insn extension language hook Andrew Burgess
  2022-04-25 11:26         ` Eli Zaretskii
@ 2022-05-03 14:55         ` Simon Marchi
  2022-05-05 18:17           ` Andrew Burgess
  1 sibling, 1 reply; 80+ messages in thread
From: Simon Marchi @ 2022-05-03 14:55 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches; +Cc: Andrew Burgess

> +@defivar DisassembleInfo address
> +An integer containing the address at which @value{GDBN} wishes to
> +disassemble a single instruction.
> +@end defivar

This use of defivar results in the text

  Instance Variable of DisassembleInfo: address

Just a nit, but IWBN to use Python nomenclature, this is a read-only
property or attribute.  However, if the rest of the Python doc uses
this, I don't mind continuing with what we have for consistency.

> +@defmethod Disassembler __init__ (name)
> +The constructor takes @var{name}, a string, which should be a short
> +name for this disassembler.  Currently, this name is only used in some
> +debug output.

I would probably avoid saying that last sentence, it's bound to become
stale at some point.

> +The @code{DisassemblerResult} type is defined as a possible class to
> +represent disassembled instructions, but it is not required to use
> +this type, so long as the required attributes are present.

I think it would be more straighforward and preferable to tell people to
just return a DisassemblerResult object, unless you see some advantage
to this that I don't.

> +@smallexample
> +class ExampleDisassembler(gdb.disassembler.Disassembler):
> +    def __init__(self):
> +        super(ExampleDisassembler, self).__init__("ExampleDisassembler")

Can we use the

  super().__init__(...)

syntax, now that we dropped support for Python 2?

> +
> +    def __call__(self, info):
> +        result = gdb.disassembler.builtin_disassemble(info)
> +        if result.string is not None:

Can builtin_disassemble really return None?

> +            length = result.length
> +            text = result.string + "\t## Comment"
> +            return gdb.disassembler.DisassemblerResult(length, text)

Doesn't really matter, but this could probably modify the string in
the existing DisassemblerResult object, and then return it:

  result.string += "\t## Comment"
  return result

But I'm fine with what you have, if you think it's clearer for an
example.

> +
> +gdb.disassembler.register_disassembler(ExampleDisassembler())
> +@end smallexample
> +
>  @node Python Auto-loading
>  @subsection Python Auto-loading
>  @cindex Python auto-loading
> diff --git a/gdb/python/lib/gdb/disassembler.py b/gdb/python/lib/gdb/disassembler.py
> new file mode 100644
> index 00000000000..19ec0ecf82f
> --- /dev/null
> +++ b/gdb/python/lib/gdb/disassembler.py
> @@ -0,0 +1,109 @@
> +# Copyright (C) 2021-2022 Free Software Foundation, Inc.
> +
> +# This program is free software; you can redistribute it and/or modify
> +# it under the terms of the GNU General Public License as published by
> +# the Free Software Foundation; either version 3 of the License, or
> +# (at your option) any later version.
> +#
> +# This program is distributed in the hope that it will be useful,
> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +# GNU General Public License for more details.
> +#
> +# You should have received a copy of the GNU General Public License
> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
> +
> +"""Disassembler related module."""
> +
> +import gdb
> +import _gdb.disassembler
> +
> +from _gdb.disassembler import *

Can we avoid glob imports?  This makes it not clear what is actually
imported.  Or, is this to re-export what _gdb.disassembler exports?  If
so, maybe add a comment.

> +
> +# Module global dictionary of gdb.disassembler.Disassembler objects.
> +# The keys of this dictionary are bfd architecture names, or the
> +# special value None.
> +#
> +# When a request to disassemble comes in we first lookup the bfd
> +# architecture name from the gdbarch, if that name exists in this
> +# dictionary then we use that Disassembler object.
> +#
> +# If there's no architecture specific disassembler then we look for
> +# the key None in this dictionary, and if that key exists, we use that
> +# disassembler.
> +#
> +# If none of the above checks found a suitable disassembler, then no
> +# disassembly is performed in Python.
> +_disassemblers_dict = {}
> +
> +
> +class Disassembler(object):
> +    """A base class from which all user implemented disassemblers must
> +    inherit."""
> +
> +    def __init__(self, name):
> +        """Constructor.  Takes a name, which should be a string, which can be
> +        used to identify this disassembler in diagnostic messages."""
> +        self.name = name
> +
> +    def __call__(self, info):
> +        """A default implementation of __call__.  All sub-classes must
> +        override this method.  Calling this default implementation will throw
> +        a NotImplementedError exception."""
> +        raise NotImplementedError("Disassembler.__call__")
> +
> +
> +def register_disassembler(disassembler, architecture=None):
> +    """Register a disassembler.  DISASSEMBLER is a sub-class of
> +    gdb.disassembler.Disassembler.  ARCHITECTURE is either None or a
> +    string, the name of an architecture known to GDB.
> +
> +    DISASSEMBLER is registered as a disassmbler for ARCHITECTURE, or

disassmbler -> disassembler

> +    all architectures when ARCHITECTURE is None.
> +
> +    Returns the previous disassembler registered with this
> +    ARCHITECTURE value.
> +    """
> +
> +    if not isinstance(disassembler, Disassembler) and disassembler is not None:
> +        raise TypeError("disassembler should sub-class gdb.disassembler.Disassembler")

I see passing None for DISASSEMBLER unregisters the currently registered
disassembler.  I am not sure it was mentioned in the doc.

> +
> +    old = None
> +    if architecture in _disassemblers_dict:
> +        old = _disassemblers_dict[architecture]
> +        del _disassemblers_dict[architecture]
> +    if disassembler is not None:
> +        _disassemblers_dict[architecture] = disassembler
> +
> +    # Call the private _set_enabled function within the
> +    # _gdb.disassembler module.  This function sets a global flag
> +    # within GDB's C++ code that enables or dissables the Python
> +    # disassembler functionality, this improves performance of the
> +    # disassembler by avoiding unneeded calls into Python when we know
> +    # that no disassemblers are registered.
> +    _gdb.disassembler._set_enabled(len(_disassemblers_dict) > 0)
> +    return old
> +
> +
> +def _print_insn(info):
> +    """This function is called by GDB when it wants to disassemble an
> +    instruction.  INFO describes the instruction to be
> +    disassembled."""
> +
> +    def lookup_disassembler(arch):
> +        try:
> +            name = arch.name()
> +            if name is None:
> +                return None
> +            if name in _disassemblers_dict:
> +                return _disassemblers_dict[name]
> +            if None in _disassemblers_dict:
> +                return _disassemblers_dict[None]
> +            return None
> +        except:
> +            return None

There doesn't seem to be anything that throws in here, is there?

> +
> +    disassembler = lookup_disassembler(info.architecture)
> +    if disassembler is None:
> +        return None
> +    return disassembler(info)
> diff --git a/gdb/python/py-disasm.c b/gdb/python/py-disasm.c
> new file mode 100644
> index 00000000000..e8b33fecee4
> --- /dev/null
> +++ b/gdb/python/py-disasm.c
> @@ -0,0 +1,970 @@
> +/* Python interface to instruction disassembly.
> +
> +   Copyright (C) 2021-2022 Free Software Foundation, Inc.
> +
> +   This file is part of GDB.
> +
> +   This program is free software; you can redistribute it and/or modify
> +   it under the terms of the GNU General Public License as published by
> +   the Free Software Foundation; either version 3 of the License, or
> +   (at your option) any later version.
> +
> +   This program is distributed in the hope that it will be useful,
> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
> +   GNU General Public License for more details.
> +
> +   You should have received a copy of the GNU General Public License
> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
> +
> +#include "defs.h"
> +#include "python-internal.h"
> +#include "dis-asm.h"
> +#include "arch-utils.h"
> +#include "charset.h"
> +#include "disasm.h"
> +#include "progspace.h"
> +
> +/* Implement gdb.disassembler.DisassembleInfo type.  An object of this type
> +   represents a single disassembler request from GDB.  */
> +
> +struct disasm_info_object {

{ goes on next line.

> +  PyObject_HEAD
> +
> +  /* The architecture in which we are disassembling.  */
> +  struct gdbarch *gdbarch;
> +
> +  /* The program_space in which we are disassembling.  */
> +  struct program_space *program_space;
> +
> +  /* Address of the instruction to disassemble.  */
> +  bfd_vma address;
> +
> +  /* The disassemble_info passed from core GDB, this contains the
> +     callbacks necessary to read the instruction from core GDB, and to
> +     print the disassembled instruction.  */
> +  disassemble_info *gdb_info;
> +};
> +
> +extern PyTypeObject disasm_info_object_type
> +    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_info_object");
> +
> +/* Implement gdb.disassembler.DisassemblerResult type, an object that holds
> +   the result of calling the disassembler.  This is mostly the length of
> +   the disassembled instruction (in bytes), and the string representing the
> +   disassembled instruction.  */
> +
> +struct disasm_result_object {

Here too.

> +/* Implement gdb.disassembler.builtin_disassemble().  Calls back into GDB's
> +   builtin disassembler.  The first argument is a DisassembleInfo object
> +   describing what to disassemble.  The second argument is optional and
> +   provides a mechanism to modify the memory contents that the builtin
> +   disassembler will actually disassemble.
> +
> +   Returns an instance of gdb.disassembler.DisassemblerResult, an object
> +   that wraps a disassembled instruction, or it raises a
> +   gdb.MemoryError.  */
> +
> +static PyObject *
> +disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
> +{
> +  PyObject *info_obj, *memory_source_obj = nullptr;
> +  static const char *keywords[] = { "info", "memory_source", nullptr };
> +  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
> +					&disasm_info_object_type, &info_obj,
> +					&memory_source_obj))

I'm wondering, why is there a separate memory_source parameter when info
already has a read_memory method, that could potentially be overriden by
the user?

> +    return nullptr;
> +
> +  disasm_info_object *disasm_info = (disasm_info_object *) info_obj;
> +  if (!disasm_info_object_is_valid (disasm_info))
> +    {
> +      PyErr_SetString (PyExc_RuntimeError,
> +		       _("DisassembleInfo is no longer valid."));
> +      return nullptr;
> +    }
> +
> +  /* A memory source is any object that provides the 'read_memory'
> +     callback.  At this point we only check for the existence of a
> +     'read_memory' attribute, if this isn't callable then we'll throw an
> +     exception from within gdbpy_disassembler::read_memory_func.  */
> +  if (memory_source_obj != nullptr)
> +    {
> +      if (!PyObject_HasAttrString (memory_source_obj, "read_memory"))
> +	{
> +	  PyErr_SetString (PyExc_TypeError,
> +			   _("memory_source doesn't have a read_memory method"));
> +	  return nullptr;
> +	}
> +    }

IMO we could maybe skip this check too.  Python already produces a clear
enough exception message when trying to access an attribute that doesn't
exist, it mentions the name of the user type:

    In [4]: f = Foo()

    In [5]: f.read_memory()
    ---------------------------------------------------------------------------
    AttributeError                            Traceback (most recent call last)
    Input In [5], in <cell line: 1>()
    ----> 1 f.read_memory()

    AttributeError: 'Foo' object has no attribute 'read_memory'

> +
> +  /* Where the result will be written.  */
> +  gdbpy_disassembler disassembler (disasm_info, memory_source_obj);
> +
> +  /* Now actually perform the disassembly.  */
> +  int length
> +    = gdbarch_print_insn (disasm_info->gdbarch, disasm_info->address,
> +			  disassembler.disasm_info ());
> +
> +  if (length == -1)
> +    {
> +
> +      /* In an ideal world, every disassembler should always call the
> +	 memory error function before returning a status of -1 as the only
> +	 error a disassembler should encounter is a failure to read
> +	 memory.  Unfortunately, there are some disassemblers who don't
> +	 follow this rule, and will return -1 without calling the memory
> +	 error function.
> +
> +	 To make the Python API simpler, we just classify everything as a
> +	 memory error, but the message has to be modified for the case
> +	 where the disassembler didn't call the memory error function.  */
> +      if (disassembler.memory_error_address ().has_value ())
> +	{
> +	  CORE_ADDR addr = *disassembler.memory_error_address ();
> +	  disasmpy_set_memory_error_for_address (addr);
> +	}
> +      else
> +	PyErr_Format (gdbpy_gdb_memory_error, "unknown disassembly error");

Is there a use case for other error kinds?  For example, if the
disassembler finds that the machine code does not encode a valid
instruction, what should it do?

Maybe this is more a question for the gdb.disassembler.Disassembler
implementation side of the API.

> +      return nullptr;
> +    }
> +
> +  /* Instructions are either non-zero in length, or we got an error,
> +     indicated by a length of -1, which we handled above.  */
> +  gdb_assert (length > 0);
> +
> +  /* We should not have seen a memory error in this case.  */
> +  gdb_assert (!disassembler.memory_error_address ().has_value ());
> +
> +  /* Create an object to represent the result of the disassembler.  */
> +  gdbpy_ref<disasm_result_object> res
> +    (PyObject_New (disasm_result_object, &disasm_result_object_type));
> +  res->length = length;
> +  res->content = new string_file;

Since the DisassemblerResult object type has an __init__, maybe we
should call it?  It would be a bit of boilerplate, calling __init__
here, but it would avoid duplicating how we initialize a
disasm_result_object.

> +  *(res->content) = disassembler.release ();
> +
> +  return reinterpret_cast<PyObject *> (res.release ());
> +}
> +
> +/* Implement gdb.set_enabled function.  Takes a boolean parameter, and

gdb.disassembler._set_enabled?

> +/* This implements the disassemble_info read_memory_func callback.  This
> +   will either call the standard read memory function, or, if the user has
> +   supplied a memory source (see disasmpy_builtin_disassemble) then this
> +   will call back into Python to obtain the memory contents.
> +
> +   Read LEN bytes from MEMADDR and place them into BUFF.  Return 0 on
> +   success (in which case BUFF has been filled), or -1 on error, in which
> +   case the contents of BUFF are undefined.  */
> +
> +int
> +gdbpy_disassembler::read_memory_func (bfd_vma memaddr, gdb_byte *buff,
> +				      unsigned int len,
> +				      struct disassemble_info *info)
> +{
> +  gdbpy_disassembler *dis
> +    = static_cast<gdbpy_disassembler *> (info->application_data);
> +  disasm_info_object *obj = dis->py_disasm_info ();
> +  PyObject *memory_source = dis->m_memory_source;
> +
> +  /* The simple case, the user didn't pass a separate memory source, so we
> +     just delegate to the standard disassemble_info read_memory_func,
> +     passing in the original disassemble_info object, which core GDB might
> +     require in order to read the instruction bytes (when reading the
> +     instruction from a buffer).  */
> +  if (memory_source == nullptr)
> +    return obj->gdb_info->read_memory_func (memaddr, buff, len, obj->gdb_info);
> +
> +  /* The user provided a separate memory source, we need to call the
> +     read_memory method on the memory source and use the buffer it returns
> +     as the bytes of memory.  */
> +  LONGEST offset = (LONGEST) memaddr - (LONGEST) obj->address;
> +  gdbpy_ref<> result_obj (PyObject_CallMethod (memory_source, "read_memory",
> +					       "KL", len, offset));
> +  if (result_obj == nullptr)
> +    {
> +      /* If we got a gdb.MemoryError then we ignore this and just report
> +	 that the read failed to the caller.  The caller is then
> +	 responsible for calling the memory_error_func if it wants to.
> +	 Remember, the disassembler might just be probing to see if these
> +	 bytes can be read, if we automatically call the memory error
> +	 function, we can end up registering an error prematurely.  */
> +      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
> +	PyErr_Clear ();
> +      else
> +	gdbpy_print_stack ();
> +      return -1;
> +    }
> +
> +  /* Convert the result to a buffer.  */
> +  Py_buffer py_buff;
> +  if (!PyObject_CheckBuffer (result_obj.get ())
> +      || PyObject_GetBuffer (result_obj.get(), &py_buff, PyBUF_CONTIG_RO) < 0)
> +    {
> +      PyErr_Format (PyExc_TypeError,
> +		    _("Result from read_memory is not a buffer"));
> +      gdbpy_print_stack ();
> +      return -1;
> +    }
> +
> +  /* Wrap PY_BUFF so that it is cleaned up correctly at the end of this
> +     scope.  */
> +  Py_buffer_up buffer_up (&py_buff);
> +
> +  /* Validate that the buffer is the correct length.  */
> +  if (py_buff.len != len)
> +    {
> +      PyErr_Format (PyExc_ValueError,
> +		    _("Result from read_memory is incorrectly sized buffer"));

It's handy to tell the expected and actual sizes in these cases.

> +      gdbpy_print_stack ();
> +      return -1;
> +    }
> +
> +  /* Copy the data out of the Python buffer and return succsess.*/

succsess -> success

> +/* A wrapper around a reference to a Python DisassembleInfo object, which
> +   ensures that the object is marked as invalid when we leave the enclosing
> +   scope.
> +
> +   Each DisassembleInfo is created in gdbpy_print_insn, and is done with by
> +   the time that function returns.  However, there's nothing to stop a user
> +   caching a reference to the DisassembleInfo, and thus keeping the object
> +   around.
> +
> +   We therefore have the notion of a DisassembleInfo becoming invalid, this
> +   happens when gdbpy_print_insn returns.  This class is responsible for
> +   marking the DisassembleInfo as invalid in its destructor.  */
> +
> +struct scoped_disasm_info_object
> +{
> +  /* Constructor.  */
> +  scoped_disasm_info_object (struct gdbarch *gdbarch, CORE_ADDR memaddr,
> +			     disassemble_info *info)
> +    : m_disasm_info (allocate_disasm_info_object ())
> +  {
> +    m_disasm_info->address = memaddr;
> +    m_disasm_info->gdb_info = info;
> +    m_disasm_info->gdbarch = gdbarch;
> +    m_disasm_info->program_space = current_program_space;
> +  }
> +
> +  /* Upon destruction mark m_diasm_info as invalid.  */
> +  ~scoped_disasm_info_object ()
> +  {
> +    m_disasm_info->gdb_info = nullptr;
> +  }
> +
> +  /* Return a pointer to the underlying disasm_info_object instance.  */
> +  disasm_info_object *
> +  get () const
> +  {
> +    return m_disasm_info.get ();
> +  }
> +
> +private:
> +
> +  /* Wrapper around the call to PyObject_New, this wrapper function can be
> +     called from the constructor initialization list, while PyObject_New, a
> +     macro, can't.  */
> +  static disasm_info_object *
> +  allocate_disasm_info_object ()
> +  {
> +    return (disasm_info_object *) PyObject_New (disasm_info_object,
> +						&disasm_info_object_type);
> +  }

This makes me think, is there a way for a Python user to call into the
disassembler?  Should the DisassembleInfo object have a user-callable
constructor, should the user want to construct one?

I could imagine you could do this out of nowhere:

  gdb.disassembler.builtin_disassemble(DisassembleInfo(addr, arch, progspace))

But that would skip the Python disassemblers, so a user could also want
to call this function that doesn't exist today:

  gdb.disassemble(DisassembleInfo(addr, arch, progspace))

> +
> +  /* A reference to a gdb.disassembler.DisassembleInfo object.  When this
> +     containing instance goes out of scope this reference is released,
> +     however, the user might be holding other references to the
> +     DisassembleInfo object in Python code, so the underlying object might
> +     not be deleted.  */
> +  gdbpy_ref<disasm_info_object> m_disasm_info;
> +};
> +
> +/* See python-internal.h.  */
> +
> +gdb::optional<int>
> +gdbpy_print_insn (struct gdbarch *gdbarch, CORE_ADDR memaddr,
> +		  disassemble_info *info)
> +{
> +  /* Early exit case.  This must be done as early as possible, and
> +     definitely before we enter Python environment.  The
> +     python_print_insn_enabled flag is set (from Python) only when the user
> +     has installed one (or more) Python disassemblers.  So in the common
> +     case (no custom disassembler installed) this flag will be false,
> +     allowing for a quick return.  */
> +  if (!gdb_python_initialized || !python_print_insn_enabled)
> +    return {};
> +
> +  gdbpy_enter enter_py (get_current_arch (), current_language);
> +
> +  /* The attribute we are going to lookup that provides the print_insn
> +     functionality.  */
> +  static const char *callback_name = "_print_insn";
> +
> +  /* Grab a reference to the gdb.disassembler module, and check it has the
> +     attribute that we need.  */
> +  gdbpy_ref<> gdb_python_disassembler_module
> +    (PyImport_ImportModule ("gdb.disassembler"));
> +  if (gdb_python_disassembler_module == nullptr
> +      || !PyObject_HasAttrString (gdb_python_disassembler_module.get (),
> +				  callback_name))
> +    return {};

Since it's kind of expected that _print_insn is there, should this be a
gdb_assert?  Just returning silently here makes it more difficult to
investigate problems, IMO.  The only reason for the assert to trigger
would be if someone messed with the GDB Python modules, which I think is
ok.

> +
> +  /* Now grab the callback attribute from the module.  */
> +  gdbpy_ref<> hook
> +    (PyObject_GetAttrString (gdb_python_disassembler_module.get (),
> +			     callback_name));
> +  if (hook == nullptr)

This can't be true, since you already checked with
PyObject_HasAttrString.

> +    {
> +      gdbpy_print_stack ();
> +      return {};
> +    }
> +
> +  /* Create the new DisassembleInfo object we will pass into Python.  This
> +     object will be marked as invalid when we leave this scope.  */
> +  scoped_disasm_info_object scoped_disasm_info (gdbarch, memaddr, info);
> +  disasm_info_object *disasm_info = scoped_disasm_info.get ();
> +
> +  /* Call into the registered disassembler to (possibly) perform the
> +     disassembly.  */
> +  PyObject *insn_disas_obj = (PyObject *) disasm_info;
> +  gdbpy_ref<> result (PyObject_CallFunctionObjArgs (hook.get (),
> +						    insn_disas_obj,
> +						    nullptr));
> +
> +  if (result == nullptr)
> +    {
> +      /* The call into Python code resulted in an exception.  If this was a
> +	 gdb.MemoryError, then we can figure out an address and call the
> +	 disassemble_info::memory_error_func to report the error back to
> +	 core GDB.  Any other exception type we assume means a bug in the
> +	 user's code, and print stack.  */
> +
> +      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
> +	{
> +	  /* A gdb.MemoryError might have an address attribute which
> +	     contains the address at which the memory error occurred.  If
> +	     this is the case then use this address, otherwise, fallback to
> +	     just using the address of the instruction we were asked to
> +	     disassemble.  */
> +	  PyObject *error_type, *error_value, *error_traceback;
> +	  CORE_ADDR addr;
> +
> +	  PyErr_Fetch (&error_type, &error_value, &error_traceback);
> +
> +	  if (error_value != nullptr
> +	      && PyObject_HasAttrString (error_value, "address"))
> +	    {
> +	      PyObject *addr_obj = PyObject_GetAttrString (error_value,
> +							   "address");
> +	      if (get_addr_from_python (addr_obj, &addr) < 0)
> +		addr = disasm_info->address;
> +	    }
> +	  else
> +	    addr = disasm_info->address;
> +
> +	  PyErr_Clear ();
> +	  info->memory_error_func (-1, addr, info);
> +	  return gdb::optional<int> (-1);
> +	}
> +      else
> +	{
> +	  /* Anything that is not gdb.MemoryError.  */
> +	  gdbpy_print_stack ();
> +	  return {};
> +	}
> +    }
> +  else if (result == Py_None)
> +    {
> +      /* A return value of None indicates that the Python code could not,
> +	 or doesn't want to, disassemble this instruction.  Just return an
> +	 empty result and core GDB will try to disassemble this for us.  */
> +      return {};
> +    }
> +
> +  /* The call into Python neither raised an exception, or returned None.
> +     Check to see if the result looks valid.  */
> +  gdbpy_ref<> length_obj (PyObject_GetAttrString (result.get (), "length"));
> +  if (length_obj == nullptr)
> +    {
> +      gdbpy_print_stack ();
> +      return {};
> +    }
> +
> +  gdbpy_ref<> string_obj (PyObject_GetAttrString (result.get (), "string"));
> +  if (string_obj == nullptr)
> +    {
> +      gdbpy_print_stack ();
> +      return {};
> +    }
> +  if (!gdbpy_is_string (string_obj.get ()))
> +    {
> +      PyErr_SetString (PyExc_TypeError, _("string attribute is not a string."));
> +      gdbpy_print_stack ();
> +      return {};
> +    }
> +
> +  gdb::unique_xmalloc_ptr<char> string
> +    = gdbpy_obj_to_string (string_obj.get ());
> +  if (string == nullptr)
> +    {
> +      gdbpy_print_stack ();
> +      return {};
> +    }
> +
> +  long length;
> +  if (!gdb_py_int_as_long (length_obj.get (), &length))
> +    {
> +      gdbpy_print_stack ();
> +      return {};
> +    }
> +
> +  long max_insn_length = (gdbarch_max_insn_length_p (gdbarch) ?
> +			  gdbarch_max_insn_length (gdbarch) : INT_MAX);
> +  if (length <= 0 || length > max_insn_length)
> +    {
> +      PyErr_SetString (PyExc_ValueError, _("Invalid length attribute."));

IWBN to help the user here, say why it is invalid: the length attribute
value X exceeds the architecture's max insn length of Y.

> +      gdbpy_print_stack ();
> +      return {};
> +    }
> +
> +  if (strlen (string.get ()) == 0)
> +    {
> +      PyErr_SetString (PyExc_ValueError, _("string attribute must not be empty."));
> +      gdbpy_print_stack ();
> +      return {};
> +    }
> +
> +  /* Print the disassembled instruction back to core GDB, and return the
> +     length of the disassembled instruction.  */
> +  info->fprintf_func (info->stream, "%s", string.get ());
> +  return gdb::optional<int> (length);
> +}
> +
> +/* The tp_dealloc callback for the DisassemblerResult type.  Takes care of
> +   deallocating the content buffer.  */
> +
> +static void
> +disasmpy_dealloc_result (PyObject *self)
> +{
> +  disasm_result_object *obj = (disasm_result_object *) self;
> +  delete obj->content;
> +  Py_TYPE (self)->tp_free (self);
> +}
> +
> +/* The get/set attributes of the gdb.disassembler.DisassembleInfo type.  */
> +
> +static gdb_PyGetSetDef disasm_info_object_getset[] = {
> +  { "address", disasmpy_info_address, nullptr,
> +    "Start address of the instruction to disassemble.", nullptr },
> +  { "architecture", disasmpy_info_architecture, nullptr,
> +    "Architecture to disassemble in", nullptr },
> +  { "progspace", disasmpy_info_progspace, nullptr,
> +    "Program space to disassemble in", nullptr },
> +  { nullptr }   /* Sentinel */
> +};
> +
> +/* The methods of the gdb.disassembler.DisassembleInfo type.  */
> +
> +static PyMethodDef disasm_info_object_methods[] = {
> +  { "read_memory", (PyCFunction) disasmpy_info_read_memory,
> +    METH_VARARGS | METH_KEYWORDS,
> +    "read_memory (LEN, OFFSET = 0) -> Octets[]\n\
> +Read LEN octets for the instruction to disassemble." },
> +  { "is_valid", disasmpy_info_is_valid, METH_NOARGS,
> +    "is_valid () -> Boolean.\n\
> +Return true if this DisassembleInfo is valid, false if not." },
> +  {nullptr}  /* Sentinel */
> +};
> +
> +/* The get/set attributes of the gdb.disassembler.DisassemblerResult type.  */
> +
> +static gdb_PyGetSetDef disasm_result_object_getset[] = {
> +  { "length", disasmpy_result_length, nullptr,
> +    "Length of the disassembled instruction.", nullptr },
> +  { "string", disasmpy_result_string, nullptr,
> +    "String representing the disassembled instruction.", nullptr },
> +  { nullptr }   /* Sentinel */
> +};
> +
> +/* These are the methods we add into the _gdb.disassembler module, which
> +   are then imported into the gdb.disassembler module.  These are global
> +   functions that support performing disassembly.  */
> +
> +PyMethodDef python_disassembler_methods[] =
> +{
> +  { "builtin_disassemble", (PyCFunction) disasmpy_builtin_disassemble,
> +    METH_VARARGS | METH_KEYWORDS,
> +    "builtin_disassemble (INFO, MEMORY_SOURCE = None) -> None\n\
> +Disassemble using GDB's builtin disassembler.  INFO is an instance of\n\
> +gdb.disassembler.DisassembleInfo.  The MEMORY_SOURCE, if not None, should\n\
> +be an object with the read_memory method." },
> +  { "_set_enabled", (PyCFunction) disasmpy_set_enabled,
> +    METH_VARARGS | METH_KEYWORDS,
> +    "_set_enabled (STATE) -> None\n\
> +Set whether GDB should call into the Python _print_insn code or not." },
> +  {nullptr, nullptr, 0, nullptr}
> +};
> +
> +/* Structure to define the _gdb.disassembler module.  */
> +
> +static struct PyModuleDef python_disassembler_module_def =
> +{
> +  PyModuleDef_HEAD_INIT,
> +  "_gdb.disassembler",
> +  nullptr,
> +  -1,
> +  python_disassembler_methods,
> +  nullptr,
> +  nullptr,
> +  nullptr,
> +  nullptr
> +};
> +
> +/* Called to initialize the Python structures in this file.  */
> +
> +int
> +gdbpy_initialize_disasm
> +(void)

Parenthesis on the previous line, and remove void?

Simon

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv4 1/5] gdb: add new base class to gdb_disassembler
  2022-05-03 13:34         ` Simon Marchi
@ 2022-05-03 16:13           ` Andrew Burgess
  2022-05-05 17:39           ` Andrew Burgess
  1 sibling, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-05-03 16:13 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches; +Cc: Andrew Burgess

Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> writes:

>> From: Andrew Burgess <andrew.burgess@embecosm.com>
>> 
>> The motivation for this change is an upcoming Python disassembler API
>> that I would like to add.  As part of that change I need to create a
>> new disassembler like class that contains a disassemble_info and a
>> gdbarch.  The management of these two objects is identical to how we
>> manage these objects within gdb_disassembler, so it might be tempting
>> for my new class to inherit from gdb_disassembler.
>> 
>> The problem however, is that gdb_disassembler has a tight connection
>> between its constructor, and its print_insn method.  In the
>> constructor the ui_file* that is passed in is replaced with a member
>> variable string_file*, and then in print_insn, the contents of the
>> member variable string_file are printed to the original ui_file*.
>> 
>> What this means is that the gdb_disassembler class has a tight
>> coupling between its constructor and print_insn; the class just isn't
>> intended to be used in a situation where print_insn is not going to be
>> called, which is how my (upcoming) sub-class would need to operate.
>> 
>> My solution then, is to separate out the management of the
>> disassemble_info and gdbarch into a new gdb_disassemble_info class,
>> and make this class a parent of gdb_disassembler.
>> 
>> In arm-tdep.c and mips-tdep.c, where we used to cast the
>> disassemble_info->application_data to a gdb_disassembler, we can now
>> cast to a gdb_disassemble_info as we only need to access the gdbarch
>> information.
>> 
>> Now, my new Python disassembler sub-class will still want to print
>> things to an output stream, and so we will want access to the
>> dis_asm_fprintf functionality for printing.
>> 
>> However, rather than move this printing code into the
>> gdb_disassemble_info base class, I have added yet another level of
>> hierarchy, a gdb_printing_disassembler, thus the class structure is
>> now:
>> 
>>   struct gdb_disassemble_info {};
>>   struct gdb_printing_disassembler : public gdb_disassemble_info {};
>>   struct gdb_disassembler : public gdb_printing_disassembler {};
>
> I can't explain it very well, but it seems a little strange to make the
> other classes inherit from gdb_disassemble_info.  From the name (and the
> code), I understand that gdb_disassemble_info purely holds the data
> necessary for the disassemblers to do their job.  So this is not really
> an IS-A relationship, but more a HAS-A.  So I would think composition
> would be more natural (i.e. gdb_printing_disassembler would have a field
> of type gdb_disassemble_info).

I understand exactly where you're unease comes from, and I had the same
feeling.  The problem I think is the name of these things.

So we used to have two classes 'gdb_disassembler' which is really a
wrapper around the libopcodes 'disassemble_info'.

Then we had 'gdb_pretty_print_disassembler' which had-a
gdb_disassembler.

In one of my original versions of this patch I added just
gdb_disassemble_info as a base class, then gdb_disassembler inherited
from that.

At some point the patch expanded, and the new classes took the
*_disassembler suffix rather than the *_disassemble_info suffix, which
was maybe a mistake.

So the classes you listed above, gdb_printing_disassembler and
gdb_disassembler, really are specialisations of gdb_disassemble_info,
but maybe this would all be better if I renamed all these things to
use a _disassemble_info suffix?

Thanks,
Andrew


>
> But with the callbacks we pass to struct disassemble_info (such as
> fprintf_func and fprintf_styled_func), what is data and what is behavior
> gets a little blurry.
>
> It doesn't matter much in the end, I'm fine with what you have.  And it
> can always be refactored later.
>
>> +gdb_disassemble_info::gdb_disassemble_info
>> +  (struct gdbarch *gdbarch, struct ui_file *stream,
>> +   read_memory_ftype read_memory_func, memory_error_ftype memory_error_func,
>> +   print_address_ftype print_address_func, fprintf_ftype fprintf_func,
>> +   fprintf_styled_ftype fprintf_styled_func)
>> +    : m_gdbarch (gdbarch)
>>  {
>> -  init_disassemble_info (&m_di, &m_buffer, dis_asm_fprintf,
>> -			 dis_asm_styled_fprintf);
>> +  gdb_assert (fprintf_func != nullptr);
>> +  gdb_assert (fprintf_styled_func != nullptr);
>> +  init_disassemble_info (&m_di, stream, fprintf_func,
>> +			 fprintf_styled_func);
>>    m_di.flavour = bfd_target_unknown_flavour;
>> -  m_di.memory_error_func = dis_asm_memory_error;
>> -  m_di.print_address_func = dis_asm_print_address;
>> -  /* NOTE: cagney/2003-04-28: The original code, from the old Insight
>> -     disassembler had a local optimization here.  By default it would
>> -     access the executable file, instead of the target memory (there
>> -     was a growing list of exceptions though).  Unfortunately, the
>> -     heuristic was flawed.  Commands like "disassemble &variable"
>> -     didn't work as they relied on the access going to the target.
>> -     Further, it has been superseeded by trust-read-only-sections
>> -     (although that should be superseeded by target_trust..._p()).  */
>> -  m_di.read_memory_func = read_memory_func;
>> +  if (memory_error_func != nullptr)
>> +    m_di.memory_error_func = memory_error_func;
>> +  if (print_address_func != nullptr)
>> +    m_di.print_address_func = print_address_func;
>> +  if (read_memory_func != nullptr)
>> +    m_di.read_memory_func = read_memory_func;
>
> Are the nullptr checks needed?  Are these fields initially nullptr, or
> they have some other value?
>
>
>> +struct gdb_disassemble_info
>> +{
>> +  DISABLE_COPY_AND_ASSIGN (gdb_disassemble_info);
>>  
>> -  /* Return the gdbarch of gdb_disassembler.  */
>> +  /* Return the gdbarch we are disassembing for.  */
>
> disassembing -> disassembling
>
> Simon


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv4 1/5] gdb: add new base class to gdb_disassembler
  2022-05-03 13:34         ` Simon Marchi
  2022-05-03 16:13           ` Andrew Burgess
@ 2022-05-05 17:39           ` Andrew Burgess
  1 sibling, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-05-05 17:39 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches; +Cc: Andrew Burgess

Simon Marchi <simark@simark.ca> writes:

>> From: Andrew Burgess <andrew.burgess@embecosm.com>
>> 
>> The motivation for this change is an upcoming Python disassembler API
>> that I would like to add.  As part of that change I need to create a
>> new disassembler like class that contains a disassemble_info and a
>> gdbarch.  The management of these two objects is identical to how we
>> manage these objects within gdb_disassembler, so it might be tempting
>> for my new class to inherit from gdb_disassembler.
>> 
>> The problem however, is that gdb_disassembler has a tight connection
>> between its constructor, and its print_insn method.  In the
>> constructor the ui_file* that is passed in is replaced with a member
>> variable string_file*, and then in print_insn, the contents of the
>> member variable string_file are printed to the original ui_file*.
>> 
>> What this means is that the gdb_disassembler class has a tight
>> coupling between its constructor and print_insn; the class just isn't
>> intended to be used in a situation where print_insn is not going to be
>> called, which is how my (upcoming) sub-class would need to operate.
>> 
>> My solution then, is to separate out the management of the
>> disassemble_info and gdbarch into a new gdb_disassemble_info class,
>> and make this class a parent of gdb_disassembler.
>> 
>> In arm-tdep.c and mips-tdep.c, where we used to cast the
>> disassemble_info->application_data to a gdb_disassembler, we can now
>> cast to a gdb_disassemble_info as we only need to access the gdbarch
>> information.
>> 
>> Now, my new Python disassembler sub-class will still want to print
>> things to an output stream, and so we will want access to the
>> dis_asm_fprintf functionality for printing.
>> 
>> However, rather than move this printing code into the
>> gdb_disassemble_info base class, I have added yet another level of
>> hierarchy, a gdb_printing_disassembler, thus the class structure is
>> now:
>> 
>>   struct gdb_disassemble_info {};
>>   struct gdb_printing_disassembler : public gdb_disassemble_info {};
>>   struct gdb_disassembler : public gdb_printing_disassembler {};
>
> I can't explain it very well, but it seems a little strange to make the
> other classes inherit from gdb_disassemble_info.  From the name (and the
> code), I understand that gdb_disassemble_info purely holds the data
> necessary for the disassemblers to do their job.  So this is not really
> an IS-A relationship, but more a HAS-A.  So I would think composition
> would be more natural (i.e. gdb_printing_disassembler would have a field
> of type gdb_disassemble_info).
>
> But with the callbacks we pass to struct disassemble_info (such as
> fprintf_func and fprintf_styled_func), what is data and what is behavior
> gets a little blurry.
>
> It doesn't matter much in the end, I'm fine with what you have.  And it
> can always be refactored later.

Thanks.  I took another look at this code and started by simplifying
things so that we had just gdb_disassemble_info and gdb_disassembler.
Instead of having gdb_disassembler inherit from gdb_disassemble_info,
gdb_disassembler contained a gdb_disassemble_info.

Then I pulled the rest of the patches in on top and tried to get things
working again.  In the end I liked the new code a lot less that what I
have here.

The problem I ran into is that it is the gdb_disassemble_info that is
held in the disassemble_info::application_data field.  It is important
(I think) that this is the case, as we want code like gdb_print_insn_arm
(arm-tdep.c) to be able to make use of the application_data without
knowing precisely which disassembler class triggered the disassembly
call, e.g. we could be calling from a Python disassembler (after patch
#3 in this series), or from the "normal" disassembler (e.g. after an x/i
command), or we could be calling from the gdb_disassembler_test
disassembler (from the selftest files).  In all cases the
application_data is always a sub-class of gdb_disassemble_info.

So, when gdb_disassembler no longer inherits from the base type, but
instead contains a base type then what do we place in the
application_data field?  I think the only choice is to still place a
gdb_disassemble_info in that field, but this means that we end up
needing a mechanism to get from the application_data back to the actual
disassembler instance, here's why:

Currently my proposal is something like this (after patch #3):

  struct gdb_disassemble_info
  struct gdb_printing_disassembler : public gdb_disassemble_info
  struct gdb_disassembler : public gdb_printing_disassembler
  struct gdbpy_disassembler : public gdb_printing_disassembler

Each of these types are sub-classes of gdb_disassemble_info, and so,
callbacks like gdb_print_insn_arm are fine to cast the application_data
to the gdb_disassemble_info type and make use of it.

But then we have other callbacks,
e.g. gdb_disassembler::dis_asm_print_address and (from patch #3)
gdbpy_disassembler::read_memory_func, these callbacks are disassembler
specific.  In these we don't cast the application_data to
gdb_disassemble_info, but to the specific disassembler type,
gdb_disassembler and gdbpy_disassembler in the two examples I listed
above.  But there's another example in
gdb_disassembler_test::read_memory and ther might be more about.

The neat thing (I think) about using inheritance here is that this all
"just works", a callback defined within a particular disassembler can
assume that disassembler type.  A more generic callback can just assume
the generic gdb_disassemble_info type.

So after I tried restructuring things like this:

  struct gdb_disassemble_info
  struct gdb_disassembler
  struct gdbpy_disassembler

Where both gdb_disassembler and gdbpy_disassembler contain a
gdb_disassemble_info.  I still ended up placing the gdb_disassemble_info
into the application_data field.  However, now in the disassembler
specific callbacks how do I get back to the correct disassembler type?

I could give the gdb_disassemble_info a generic void* field, and then
place any pointer I want in there, but, I really don't want to do that.
Passing generic pointers around is almost never necessary (I claim) when
we have inheritance.  So what I did instead was this:

  struct gdb_disassemble_info
  struct gdb_disassembler
  struct gdbpy_disassemble_info : public gdb_disassemble_info
  struct gdbpy_disassembler

Now gdb_disassembler contains a gdb_disassemble_info, and
gdbpy_disassembler contains a gdbpy_disassemble_info.  The callback
problems is solved; generic callbacks can still treat the
application_data as a gdb_disassemble_info, but the disassembler
specific callbacks will know it's a particular sub-class of
gdb_disassemble_info (e.g. gdbpy_disassemble_info), and can cast the
application_data to that type.  The gdbpy_disassemble_info then holds a
pointer to the gdbpy_disassembler, and all is good.

Of course, the above is only a sub-set of what's needed, the
disassembler self-tests end up needing another gdb_disassemble_info
sub-class, and maybe there would be more needed if I'd actually taken
this patch to completion?

It feels like all I've done is move complexity out of disasm.h and
forced the complexity onto each user of the disassembler, which isn't
great.  The current approach might not be great, but actually using the
disassembler is pretty straight forward.

The other problem with the above is knowing what state to keep in
e.g. gdbpy_disassembler and what to move into gdbpy_disassemble_info?  I
played around with this a bit, but to some degree I felt like I was
forcing myself to keep gdbpy_disassembler around, it almost felt more
natural to move everything into gdbpy_disassemble_info, in which case,
aren't I almost back where I started?  Except I've moved the code out of
disasm.h and to the site of each user?

I know you said you'd be willing to see my original patch go in and
refactor later - and I am of course happy to see the code reworked later
- but I wanted to see if I could get something that you'd be happy with.

Anyway, if you have any suggestions I willing to take another pass at
this code.

>
>> +gdb_disassemble_info::gdb_disassemble_info
>> +  (struct gdbarch *gdbarch, struct ui_file *stream,
>> +   read_memory_ftype read_memory_func, memory_error_ftype memory_error_func,
>> +   print_address_ftype print_address_func, fprintf_ftype fprintf_func,
>> +   fprintf_styled_ftype fprintf_styled_func)
>> +    : m_gdbarch (gdbarch)
>>  {
>> -  init_disassemble_info (&m_di, &m_buffer, dis_asm_fprintf,
>> -			 dis_asm_styled_fprintf);
>> +  gdb_assert (fprintf_func != nullptr);
>> +  gdb_assert (fprintf_styled_func != nullptr);
>> +  init_disassemble_info (&m_di, stream, fprintf_func,
>> +			 fprintf_styled_func);
>>    m_di.flavour = bfd_target_unknown_flavour;
>> -  m_di.memory_error_func = dis_asm_memory_error;
>> -  m_di.print_address_func = dis_asm_print_address;
>> -  /* NOTE: cagney/2003-04-28: The original code, from the old Insight
>> -     disassembler had a local optimization here.  By default it would
>> -     access the executable file, instead of the target memory (there
>> -     was a growing list of exceptions though).  Unfortunately, the
>> -     heuristic was flawed.  Commands like "disassemble &variable"
>> -     didn't work as they relied on the access going to the target.
>> -     Further, it has been superseeded by trust-read-only-sections
>> -     (although that should be superseeded by target_trust..._p()).  */
>> -  m_di.read_memory_func = read_memory_func;
>> +  if (memory_error_func != nullptr)
>> +    m_di.memory_error_func = memory_error_func;
>> +  if (print_address_func != nullptr)
>> +    m_di.print_address_func = print_address_func;
>> +  if (read_memory_func != nullptr)
>> +    m_di.read_memory_func = read_memory_func;
>
> Are the nullptr checks needed?  Are these fields initially nullptr, or
> they have some other value?

Yes they are needed.  The three fields protected here are given non-null
values by the call to init_disassemble_info, we only want to replace
these defaults with a user supplied value if the user supplied value is
not nullptr.

Placing nullptr into these fields will cause the disassembler to crash!

This was documented in the header file on gdb_disassemble_info, but I've
added an extra comment in the code here to make it clear what's going on.

>
>
>> +struct gdb_disassemble_info
>> +{
>> +  DISABLE_COPY_AND_ASSIGN (gdb_disassemble_info);
>>  
>> -  /* Return the gdbarch of gdb_disassembler.  */
>> +  /* Return the gdbarch we are disassembing for.  */
>
> disassembing -> disassembling

Fixed.

I've working on updating patch #3 and will repost this series once
that's done.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv4 3/5] gdb/python: implement the print_insn extension language hook
  2022-05-03 14:55         ` Simon Marchi
@ 2022-05-05 18:17           ` Andrew Burgess
  2022-05-24  1:16             ` Simon Marchi
  0 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2022-05-05 18:17 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches; +Cc: Andrew Burgess

Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> writes:

>> +@defivar DisassembleInfo address
>> +An integer containing the address at which @value{GDBN} wishes to
>> +disassemble a single instruction.
>> +@end defivar
>
> This use of defivar results in the text
>
>   Instance Variable of DisassembleInfo: address
>
> Just a nit, but IWBN to use Python nomenclature, this is a read-only
> property or attribute.  However, if the rest of the Python doc uses
> this, I don't mind continuing with what we have for consistency.
>
>> +@defmethod Disassembler __init__ (name)
>> +The constructor takes @var{name}, a string, which should be a short
>> +name for this disassembler.  Currently, this name is only used in some
>> +debug output.
>
> I would probably avoid saying that last sentence, it's bound to become
> stale at some point.
>
>> +The @code{DisassemblerResult} type is defined as a possible class to
>> +represent disassembled instructions, but it is not required to use
>> +this type, so long as the required attributes are present.
>
> I think it would be more straighforward and preferable to tell people to
> just return a DisassemblerResult object, unless you see some advantage
> to this that I don't.
>
>> +@smallexample
>> +class ExampleDisassembler(gdb.disassembler.Disassembler):
>> +    def __init__(self):
>> +        super(ExampleDisassembler, self).__init__("ExampleDisassembler")
>
> Can we use the
>
>   super().__init__(...)
>
> syntax, now that we dropped support for Python 2?

Updated.

>
>> +
>> +    def __call__(self, info):
>> +        result = gdb.disassembler.builtin_disassemble(info)
>> +        if result.string is not None:
>
> Can builtin_disassemble really return None?

Nope, fixed.

>
>> +            length = result.length
>> +            text = result.string + "\t## Comment"
>> +            return gdb.disassembler.DisassemblerResult(length, text)
>
> Doesn't really matter, but this could probably modify the string in
> the existing DisassemblerResult object, and then return it:
>
>   result.string += "\t## Comment"
>   return result
>
> But I'm fine with what you have, if you think it's clearer for an
> example.

The problem is all the properties of DisassemblerResult are read-only.
Given it's pretty light-weight I didn't really see any problem just
creating a new one.

I suspect that I might end up changing that in the future, but for now I
don't see any great need to allow for modifications right now.  I figure
extending the API to allow modifications in the future is fine if/when
that becomes critical.

Let me know if that's going to be a problem and I can get the setting
code added now.

>
>> +
>> +gdb.disassembler.register_disassembler(ExampleDisassembler())
>> +@end smallexample
>> +
>>  @node Python Auto-loading
>>  @subsection Python Auto-loading
>>  @cindex Python auto-loading
>> diff --git a/gdb/python/lib/gdb/disassembler.py b/gdb/python/lib/gdb/disassembler.py
>> new file mode 100644
>> index 00000000000..19ec0ecf82f
>> --- /dev/null
>> +++ b/gdb/python/lib/gdb/disassembler.py
>> @@ -0,0 +1,109 @@
>> +# Copyright (C) 2021-2022 Free Software Foundation, Inc.
>> +
>> +# This program is free software; you can redistribute it and/or modify
>> +# it under the terms of the GNU General Public License as published by
>> +# the Free Software Foundation; either version 3 of the License, or
>> +# (at your option) any later version.
>> +#
>> +# This program is distributed in the hope that it will be useful,
>> +# but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +# GNU General Public License for more details.
>> +#
>> +# You should have received a copy of the GNU General Public License
>> +# along with this program.  If not, see <http://www.gnu.org/licenses/>.
>> +
>> +"""Disassembler related module."""
>> +
>> +import gdb
>> +import _gdb.disassembler
>> +
>> +from _gdb.disassembler import *
>
> Can we avoid glob imports?  This makes it not clear what is actually
> imported.  Or, is this to re-export what _gdb.disassembler exports?  If
> so, maybe add a comment.

It's the second case.  We do something similar in
gdb/python/lib/gdb/__init__.py.  I've added a comment in this case
explaining what's going on.

>
>> +
>> +# Module global dictionary of gdb.disassembler.Disassembler objects.
>> +# The keys of this dictionary are bfd architecture names, or the
>> +# special value None.
>> +#
>> +# When a request to disassemble comes in we first lookup the bfd
>> +# architecture name from the gdbarch, if that name exists in this
>> +# dictionary then we use that Disassembler object.
>> +#
>> +# If there's no architecture specific disassembler then we look for
>> +# the key None in this dictionary, and if that key exists, we use that
>> +# disassembler.
>> +#
>> +# If none of the above checks found a suitable disassembler, then no
>> +# disassembly is performed in Python.
>> +_disassemblers_dict = {}
>> +
>> +
>> +class Disassembler(object):
>> +    """A base class from which all user implemented disassemblers must
>> +    inherit."""
>> +
>> +    def __init__(self, name):
>> +        """Constructor.  Takes a name, which should be a string, which can be
>> +        used to identify this disassembler in diagnostic messages."""
>> +        self.name = name
>> +
>> +    def __call__(self, info):
>> +        """A default implementation of __call__.  All sub-classes must
>> +        override this method.  Calling this default implementation will throw
>> +        a NotImplementedError exception."""
>> +        raise NotImplementedError("Disassembler.__call__")
>> +
>> +
>> +def register_disassembler(disassembler, architecture=None):
>> +    """Register a disassembler.  DISASSEMBLER is a sub-class of
>> +    gdb.disassembler.Disassembler.  ARCHITECTURE is either None or a
>> +    string, the name of an architecture known to GDB.
>> +
>> +    DISASSEMBLER is registered as a disassmbler for ARCHITECTURE, or
>
> disassmbler -> disassembler

Fixed.

>
>> +    all architectures when ARCHITECTURE is None.
>> +
>> +    Returns the previous disassembler registered with this
>> +    ARCHITECTURE value.
>> +    """
>> +
>> +    if not isinstance(disassembler, Disassembler) and disassembler is not None:
>> +        raise TypeError("disassembler should sub-class gdb.disassembler.Disassembler")
>
> I see passing None for DISASSEMBLER unregisters the currently registered
> disassembler.  I am not sure it was mentioned in the doc.
>
>> +
>> +    old = None
>> +    if architecture in _disassemblers_dict:
>> +        old = _disassemblers_dict[architecture]
>> +        del _disassemblers_dict[architecture]
>> +    if disassembler is not None:
>> +        _disassemblers_dict[architecture] = disassembler
>> +
>> +    # Call the private _set_enabled function within the
>> +    # _gdb.disassembler module.  This function sets a global flag
>> +    # within GDB's C++ code that enables or dissables the Python
>> +    # disassembler functionality, this improves performance of the
>> +    # disassembler by avoiding unneeded calls into Python when we know
>> +    # that no disassemblers are registered.
>> +    _gdb.disassembler._set_enabled(len(_disassemblers_dict) > 0)
>> +    return old
>> +
>> +
>> +def _print_insn(info):
>> +    """This function is called by GDB when it wants to disassemble an
>> +    instruction.  INFO describes the instruction to be
>> +    disassembled."""
>> +
>> +    def lookup_disassembler(arch):
>> +        try:
>> +            name = arch.name()
>> +            if name is None:
>> +                return None
>> +            if name in _disassemblers_dict:
>> +                return _disassemblers_dict[name]
>> +            if None in _disassemblers_dict:
>> +                return _disassemblers_dict[None]
>> +            return None
>> +        except:
>> +            return None
>
> There doesn't seem to be anything that throws in here, is there?
>
>> +
>> +    disassembler = lookup_disassembler(info.architecture)
>> +    if disassembler is None:
>> +        return None
>> +    return disassembler(info)
>> diff --git a/gdb/python/py-disasm.c b/gdb/python/py-disasm.c
>> new file mode 100644
>> index 00000000000..e8b33fecee4
>> --- /dev/null
>> +++ b/gdb/python/py-disasm.c
>> @@ -0,0 +1,970 @@
>> +/* Python interface to instruction disassembly.
>> +
>> +   Copyright (C) 2021-2022 Free Software Foundation, Inc.
>> +
>> +   This file is part of GDB.
>> +
>> +   This program is free software; you can redistribute it and/or modify
>> +   it under the terms of the GNU General Public License as published by
>> +   the Free Software Foundation; either version 3 of the License, or
>> +   (at your option) any later version.
>> +
>> +   This program is distributed in the hope that it will be useful,
>> +   but WITHOUT ANY WARRANTY; without even the implied warranty of
>> +   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
>> +   GNU General Public License for more details.
>> +
>> +   You should have received a copy of the GNU General Public License
>> +   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
>> +
>> +#include "defs.h"
>> +#include "python-internal.h"
>> +#include "dis-asm.h"
>> +#include "arch-utils.h"
>> +#include "charset.h"
>> +#include "disasm.h"
>> +#include "progspace.h"
>> +
>> +/* Implement gdb.disassembler.DisassembleInfo type.  An object of this type
>> +   represents a single disassembler request from GDB.  */
>> +
>> +struct disasm_info_object {
>
> { goes on next line.

Doh.  I wondered how I made this mistake.  Turns out it's just copy 'n'
paste from other bits of Python code.  Anyway, I fixed all the ones in
this file.

>
>> +  PyObject_HEAD
>> +
>> +  /* The architecture in which we are disassembling.  */
>> +  struct gdbarch *gdbarch;
>> +
>> +  /* The program_space in which we are disassembling.  */
>> +  struct program_space *program_space;
>> +
>> +  /* Address of the instruction to disassemble.  */
>> +  bfd_vma address;
>> +
>> +  /* The disassemble_info passed from core GDB, this contains the
>> +     callbacks necessary to read the instruction from core GDB, and to
>> +     print the disassembled instruction.  */
>> +  disassemble_info *gdb_info;
>> +};
>> +
>> +extern PyTypeObject disasm_info_object_type
>> +    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_info_object");
>> +
>> +/* Implement gdb.disassembler.DisassemblerResult type, an object that holds
>> +   the result of calling the disassembler.  This is mostly the length of
>> +   the disassembled instruction (in bytes), and the string representing the
>> +   disassembled instruction.  */
>> +
>> +struct disasm_result_object {
>
> Here too.
>
>> +/* Implement gdb.disassembler.builtin_disassemble().  Calls back into GDB's
>> +   builtin disassembler.  The first argument is a DisassembleInfo object
>> +   describing what to disassemble.  The second argument is optional and
>> +   provides a mechanism to modify the memory contents that the builtin
>> +   disassembler will actually disassemble.
>> +
>> +   Returns an instance of gdb.disassembler.DisassemblerResult, an object
>> +   that wraps a disassembled instruction, or it raises a
>> +   gdb.MemoryError.  */
>> +
>> +static PyObject *
>> +disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
>> +{
>> +  PyObject *info_obj, *memory_source_obj = nullptr;
>> +  static const char *keywords[] = { "info", "memory_source", nullptr };
>> +  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
>> +					&disasm_info_object_type, &info_obj,
>> +					&memory_source_obj))
>
> I'm wondering, why is there a separate memory_source parameter when info
> already has a read_memory method, that could potentially be overriden by
> the user?

Here's how the API would be used right now:

  class MyDisassembler(Disassembler):
    def __init__(self, name):
      super().__init__(name)
      self.__info = None

    def __call__(self, info):
      self.__info = info
      result = gdb.disassembler.builtin_disassemble(info, self)

    def read_memory(self, length, offset):
      return self.__info.read_memory(length, offset)

This is obviosly pretty pointless, the memory source just calls the
standard read_memory routine so you'll get the same behaviour as if no
memory source was passed at all, but it shows how the API works.

If we wanted to override the DisassembleInfo.read_memory routine we'd
do something like this:

  class MyInfo(DisassembleInfo):
    def __init__(self,old_info):
      super().__init__(old_info)

    def read_memory(self, length, offset):
      return super().read_memory(length, offset)

  class MyDisassembler(Disassembler):
    def __init__(self, name):
      super().__init__(name)

    def __call__(self, info):
      wrapped_info = MyInfo(info)
      result = gdb.disassembler.builtin_disassemble(wrapped_info)

What are your thoughts on that?  I think that would be pretty easy to
implement if you feel its an improvement.

>
>> +    return nullptr;
>> +
>> +  disasm_info_object *disasm_info = (disasm_info_object *) info_obj;
>> +  if (!disasm_info_object_is_valid (disasm_info))
>> +    {
>> +      PyErr_SetString (PyExc_RuntimeError,
>> +		       _("DisassembleInfo is no longer valid."));
>> +      return nullptr;
>> +    }
>> +
>> +  /* A memory source is any object that provides the 'read_memory'
>> +     callback.  At this point we only check for the existence of a
>> +     'read_memory' attribute, if this isn't callable then we'll throw an
>> +     exception from within gdbpy_disassembler::read_memory_func.  */
>> +  if (memory_source_obj != nullptr)
>> +    {
>> +      if (!PyObject_HasAttrString (memory_source_obj, "read_memory"))
>> +	{
>> +	  PyErr_SetString (PyExc_TypeError,
>> +			   _("memory_source doesn't have a read_memory method"));
>> +	  return nullptr;
>> +	}
>> +    }
>
> IMO we could maybe skip this check too.  Python already produces a clear
> enough exception message when trying to access an attribute that doesn't
> exist, it mentions the name of the user type:
>
>     In [4]: f = Foo()
>
>     In [5]: f.read_memory()
>     ---------------------------------------------------------------------------
>     AttributeError                            Traceback (most recent call last)
>     Input In [5], in <cell line: 1>()
>     ----> 1 f.read_memory()
>
>     AttributeError: 'Foo' object has no attribute 'read_memory'

Done.

>
>> +
>> +  /* Where the result will be written.  */
>> +  gdbpy_disassembler disassembler (disasm_info, memory_source_obj);
>> +
>> +  /* Now actually perform the disassembly.  */
>> +  int length
>> +    = gdbarch_print_insn (disasm_info->gdbarch, disasm_info->address,
>> +			  disassembler.disasm_info ());
>> +
>> +  if (length == -1)
>> +    {
>> +
>> +      /* In an ideal world, every disassembler should always call the
>> +	 memory error function before returning a status of -1 as the only
>> +	 error a disassembler should encounter is a failure to read
>> +	 memory.  Unfortunately, there are some disassemblers who don't
>> +	 follow this rule, and will return -1 without calling the memory
>> +	 error function.
>> +
>> +	 To make the Python API simpler, we just classify everything as a
>> +	 memory error, but the message has to be modified for the case
>> +	 where the disassembler didn't call the memory error function.  */
>> +      if (disassembler.memory_error_address ().has_value ())
>> +	{
>> +	  CORE_ADDR addr = *disassembler.memory_error_address ();
>> +	  disasmpy_set_memory_error_for_address (addr);
>> +	}
>> +      else
>> +	PyErr_Format (gdbpy_gdb_memory_error, "unknown disassembly error");
>
> Is there a use case for other error kinds?  For example, if the
> disassembler finds that the machine code does not encode a valid
> instruction, what should it do?
>
> Maybe this is more a question for the gdb.disassembler.Disassembler
> implementation side of the API.

No!  Like the comment says, everything should disassemble to something,
even if it's just ".byte xxxx".  The libopcodes disassembler only
supports reporting one type of error, that's memory error.  It's just
unfortunate libopcodes also uses the return value to indicate that an
error occurred, and in some cases disassemblers return -1 (to indicate
error) without setting a memory error.  In these cases the disassembler
has probably written to the output stream what went wrong, but this
really is not how libopcodes is supposed to work.

So, no.  We either have a memory error, or an unknown error.  Ideally,
given enough time, libopcodes will be fixed so that we _only_ ever emit
memory errors.

>
>> +      return nullptr;
>> +    }
>> +
>> +  /* Instructions are either non-zero in length, or we got an error,
>> +     indicated by a length of -1, which we handled above.  */
>> +  gdb_assert (length > 0);
>> +
>> +  /* We should not have seen a memory error in this case.  */
>> +  gdb_assert (!disassembler.memory_error_address ().has_value ());
>> +
>> +  /* Create an object to represent the result of the disassembler.  */
>> +  gdbpy_ref<disasm_result_object> res
>> +    (PyObject_New (disasm_result_object, &disasm_result_object_type));
>> +  res->length = length;
>> +  res->content = new string_file;
>
> Since the DisassemblerResult object type has an __init__, maybe we
> should call it?  It would be a bit of boilerplate, calling __init__
> here, but it would avoid duplicating how we initialize a
> disasm_result_object.

Done.

>
>> +  *(res->content) = disassembler.release ();
>> +
>> +  return reinterpret_cast<PyObject *> (res.release ());
>> +}
>> +
>> +/* Implement gdb.set_enabled function.  Takes a boolean parameter, and
>
> gdb.disassembler._set_enabled?

Fixed.

>
>> +/* This implements the disassemble_info read_memory_func callback.  This
>> +   will either call the standard read memory function, or, if the user has
>> +   supplied a memory source (see disasmpy_builtin_disassemble) then this
>> +   will call back into Python to obtain the memory contents.
>> +
>> +   Read LEN bytes from MEMADDR and place them into BUFF.  Return 0 on
>> +   success (in which case BUFF has been filled), or -1 on error, in which
>> +   case the contents of BUFF are undefined.  */
>> +
>> +int
>> +gdbpy_disassembler::read_memory_func (bfd_vma memaddr, gdb_byte *buff,
>> +				      unsigned int len,
>> +				      struct disassemble_info *info)
>> +{
>> +  gdbpy_disassembler *dis
>> +    = static_cast<gdbpy_disassembler *> (info->application_data);
>> +  disasm_info_object *obj = dis->py_disasm_info ();
>> +  PyObject *memory_source = dis->m_memory_source;
>> +
>> +  /* The simple case, the user didn't pass a separate memory source, so we
>> +     just delegate to the standard disassemble_info read_memory_func,
>> +     passing in the original disassemble_info object, which core GDB might
>> +     require in order to read the instruction bytes (when reading the
>> +     instruction from a buffer).  */
>> +  if (memory_source == nullptr)
>> +    return obj->gdb_info->read_memory_func (memaddr, buff, len, obj->gdb_info);
>> +
>> +  /* The user provided a separate memory source, we need to call the
>> +     read_memory method on the memory source and use the buffer it returns
>> +     as the bytes of memory.  */
>> +  LONGEST offset = (LONGEST) memaddr - (LONGEST) obj->address;
>> +  gdbpy_ref<> result_obj (PyObject_CallMethod (memory_source, "read_memory",
>> +					       "KL", len, offset));
>> +  if (result_obj == nullptr)
>> +    {
>> +      /* If we got a gdb.MemoryError then we ignore this and just report
>> +	 that the read failed to the caller.  The caller is then
>> +	 responsible for calling the memory_error_func if it wants to.
>> +	 Remember, the disassembler might just be probing to see if these
>> +	 bytes can be read, if we automatically call the memory error
>> +	 function, we can end up registering an error prematurely.  */
>> +      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
>> +	PyErr_Clear ();
>> +      else
>> +	gdbpy_print_stack ();
>> +      return -1;
>> +    }
>> +
>> +  /* Convert the result to a buffer.  */
>> +  Py_buffer py_buff;
>> +  if (!PyObject_CheckBuffer (result_obj.get ())
>> +      || PyObject_GetBuffer (result_obj.get(), &py_buff, PyBUF_CONTIG_RO) < 0)
>> +    {
>> +      PyErr_Format (PyExc_TypeError,
>> +		    _("Result from read_memory is not a buffer"));
>> +      gdbpy_print_stack ();
>> +      return -1;
>> +    }
>> +
>> +  /* Wrap PY_BUFF so that it is cleaned up correctly at the end of this
>> +     scope.  */
>> +  Py_buffer_up buffer_up (&py_buff);
>> +
>> +  /* Validate that the buffer is the correct length.  */
>> +  if (py_buff.len != len)
>> +    {
>> +      PyErr_Format (PyExc_ValueError,
>> +		    _("Result from read_memory is incorrectly sized buffer"));
>
> It's handy to tell the expected and actual sizes in these cases.

Agreed.  Done.

>
>> +      gdbpy_print_stack ();
>> +      return -1;
>> +    }
>> +
>> +  /* Copy the data out of the Python buffer and return succsess.*/
>
> succsess -> success

Fixed.

>
>> +/* A wrapper around a reference to a Python DisassembleInfo object, which
>> +   ensures that the object is marked as invalid when we leave the enclosing
>> +   scope.
>> +
>> +   Each DisassembleInfo is created in gdbpy_print_insn, and is done with by
>> +   the time that function returns.  However, there's nothing to stop a user
>> +   caching a reference to the DisassembleInfo, and thus keeping the object
>> +   around.
>> +
>> +   We therefore have the notion of a DisassembleInfo becoming invalid, this
>> +   happens when gdbpy_print_insn returns.  This class is responsible for
>> +   marking the DisassembleInfo as invalid in its destructor.  */
>> +
>> +struct scoped_disasm_info_object
>> +{
>> +  /* Constructor.  */
>> +  scoped_disasm_info_object (struct gdbarch *gdbarch, CORE_ADDR memaddr,
>> +			     disassemble_info *info)
>> +    : m_disasm_info (allocate_disasm_info_object ())
>> +  {
>> +    m_disasm_info->address = memaddr;
>> +    m_disasm_info->gdb_info = info;
>> +    m_disasm_info->gdbarch = gdbarch;
>> +    m_disasm_info->program_space = current_program_space;
>> +  }
>> +
>> +  /* Upon destruction mark m_diasm_info as invalid.  */
>> +  ~scoped_disasm_info_object ()
>> +  {
>> +    m_disasm_info->gdb_info = nullptr;
>> +  }
>> +
>> +  /* Return a pointer to the underlying disasm_info_object instance.  */
>> +  disasm_info_object *
>> +  get () const
>> +  {
>> +    return m_disasm_info.get ();
>> +  }
>> +
>> +private:
>> +
>> +  /* Wrapper around the call to PyObject_New, this wrapper function can be
>> +     called from the constructor initialization list, while PyObject_New, a
>> +     macro, can't.  */
>> +  static disasm_info_object *
>> +  allocate_disasm_info_object ()
>> +  {
>> +    return (disasm_info_object *) PyObject_New (disasm_info_object,
>> +						&disasm_info_object_type);
>> +  }
>
> This makes me think, is there a way for a Python user to call into the
> disassembler?  Should the DisassembleInfo object have a user-callable
> constructor, should the user want to construct one?
>
> I could imagine you could do this out of nowhere:
>
>   gdb.disassembler.builtin_disassemble(DisassembleInfo(addr, arch, progspace))

No! Don't do that.

We already have gdb.Architecture.disassemble which provides access to
the disassembler.  You might feel that method is misplaced on
Architecture (but that wasn't me!), but it is what it is.

I think if you are writing some random piece of Python code then you
should not be worrying about Python disassemblers vs builtin
disassembler; you should just call gdb.Architecture.disassemble and let
GDB invoke the "correct" disassembler for you.

Preventing direct calls to gdb.disassembler.builtin_disassemble is one
of the main reasons that I deliberately don't provide a user callable
constructor for DisassembleInfo, during development I did have that
method at one point, and removed it precisely to prevent the above! 

>
> But that would skip the Python disassemblers, so a user could also want
> to call this function that doesn't exist today:
>
>   gdb.disassemble(DisassembleInfo(addr, arch, progspace))

I think having:

  gdb.disassemble(start_address, end_address, architecture, program_space)

would be better than the current disassemble method on Architecture.
Thinking about what that does I suspect that I might end up having to
work on Architecture.disassemble at some point in the future, so I might
add a top-level gdb.disassemble and make the existing architecture
method forward to that one.  We'll see.

>
>> +
>> +  /* A reference to a gdb.disassembler.DisassembleInfo object.  When this
>> +     containing instance goes out of scope this reference is released,
>> +     however, the user might be holding other references to the
>> +     DisassembleInfo object in Python code, so the underlying object might
>> +     not be deleted.  */
>> +  gdbpy_ref<disasm_info_object> m_disasm_info;
>> +};
>> +
>> +/* See python-internal.h.  */
>> +
>> +gdb::optional<int>
>> +gdbpy_print_insn (struct gdbarch *gdbarch, CORE_ADDR memaddr,
>> +		  disassemble_info *info)
>> +{
>> +  /* Early exit case.  This must be done as early as possible, and
>> +     definitely before we enter Python environment.  The
>> +     python_print_insn_enabled flag is set (from Python) only when the user
>> +     has installed one (or more) Python disassemblers.  So in the common
>> +     case (no custom disassembler installed) this flag will be false,
>> +     allowing for a quick return.  */
>> +  if (!gdb_python_initialized || !python_print_insn_enabled)
>> +    return {};
>> +
>> +  gdbpy_enter enter_py (get_current_arch (), current_language);
>> +
>> +  /* The attribute we are going to lookup that provides the print_insn
>> +     functionality.  */
>> +  static const char *callback_name = "_print_insn";
>> +
>> +  /* Grab a reference to the gdb.disassembler module, and check it has the
>> +     attribute that we need.  */
>> +  gdbpy_ref<> gdb_python_disassembler_module
>> +    (PyImport_ImportModule ("gdb.disassembler"));
>> +  if (gdb_python_disassembler_module == nullptr
>> +      || !PyObject_HasAttrString (gdb_python_disassembler_module.get (),
>> +				  callback_name))
>> +    return {};
>
> Since it's kind of expected that _print_insn is there, should this be a
> gdb_assert?  Just returning silently here makes it more difficult to
> investigate problems, IMO.  The only reason for the assert to trigger
> would be if someone messed with the GDB Python modules, which I think is
> ok.

I've not gone with an assert, but I did rewrite this code so now the
user will get an error if _print_insn is not present.  I did that by
removing the HasAttrString check here, and then...

>
>> +
>> +  /* Now grab the callback attribute from the module.  */
>> +  gdbpy_ref<> hook
>> +    (PyObject_GetAttrString (gdb_python_disassembler_module.get (),
>> +			     callback_name));
>> +  if (hook == nullptr)
>
> This can't be true, since you already checked with
> PyObject_HasAttrString.

... this check is now useful, the GetAttrString will fail if _print_insn
is not present, and the PyErr will be set.

>
>> +    {
>> +      gdbpy_print_stack ();
>> +      return {};
>> +    }
>> +
>> +  /* Create the new DisassembleInfo object we will pass into Python.  This
>> +     object will be marked as invalid when we leave this scope.  */
>> +  scoped_disasm_info_object scoped_disasm_info (gdbarch, memaddr, info);
>> +  disasm_info_object *disasm_info = scoped_disasm_info.get ();
>> +
>> +  /* Call into the registered disassembler to (possibly) perform the
>> +     disassembly.  */
>> +  PyObject *insn_disas_obj = (PyObject *) disasm_info;
>> +  gdbpy_ref<> result (PyObject_CallFunctionObjArgs (hook.get (),
>> +						    insn_disas_obj,
>> +						    nullptr));
>> +
>> +  if (result == nullptr)
>> +    {
>> +      /* The call into Python code resulted in an exception.  If this was a
>> +	 gdb.MemoryError, then we can figure out an address and call the
>> +	 disassemble_info::memory_error_func to report the error back to
>> +	 core GDB.  Any other exception type we assume means a bug in the
>> +	 user's code, and print stack.  */
>> +
>> +      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
>> +	{
>> +	  /* A gdb.MemoryError might have an address attribute which
>> +	     contains the address at which the memory error occurred.  If
>> +	     this is the case then use this address, otherwise, fallback to
>> +	     just using the address of the instruction we were asked to
>> +	     disassemble.  */
>> +	  PyObject *error_type, *error_value, *error_traceback;
>> +	  CORE_ADDR addr;
>> +
>> +	  PyErr_Fetch (&error_type, &error_value, &error_traceback);
>> +
>> +	  if (error_value != nullptr
>> +	      && PyObject_HasAttrString (error_value, "address"))
>> +	    {
>> +	      PyObject *addr_obj = PyObject_GetAttrString (error_value,
>> +							   "address");
>> +	      if (get_addr_from_python (addr_obj, &addr) < 0)
>> +		addr = disasm_info->address;
>> +	    }
>> +	  else
>> +	    addr = disasm_info->address;
>> +
>> +	  PyErr_Clear ();
>> +	  info->memory_error_func (-1, addr, info);
>> +	  return gdb::optional<int> (-1);
>> +	}
>> +      else
>> +	{
>> +	  /* Anything that is not gdb.MemoryError.  */
>> +	  gdbpy_print_stack ();
>> +	  return {};
>> +	}
>> +    }
>> +  else if (result == Py_None)
>> +    {
>> +      /* A return value of None indicates that the Python code could not,
>> +	 or doesn't want to, disassemble this instruction.  Just return an
>> +	 empty result and core GDB will try to disassemble this for us.  */
>> +      return {};
>> +    }
>> +
>> +  /* The call into Python neither raised an exception, or returned None.
>> +     Check to see if the result looks valid.  */
>> +  gdbpy_ref<> length_obj (PyObject_GetAttrString (result.get (), "length"));
>> +  if (length_obj == nullptr)
>> +    {
>> +      gdbpy_print_stack ();
>> +      return {};
>> +    }
>> +
>> +  gdbpy_ref<> string_obj (PyObject_GetAttrString (result.get (), "string"));
>> +  if (string_obj == nullptr)
>> +    {
>> +      gdbpy_print_stack ();
>> +      return {};
>> +    }
>> +  if (!gdbpy_is_string (string_obj.get ()))
>> +    {
>> +      PyErr_SetString (PyExc_TypeError, _("string attribute is not a string."));
>> +      gdbpy_print_stack ();
>> +      return {};
>> +    }
>> +
>> +  gdb::unique_xmalloc_ptr<char> string
>> +    = gdbpy_obj_to_string (string_obj.get ());
>> +  if (string == nullptr)
>> +    {
>> +      gdbpy_print_stack ();
>> +      return {};
>> +    }
>> +
>> +  long length;
>> +  if (!gdb_py_int_as_long (length_obj.get (), &length))
>> +    {
>> +      gdbpy_print_stack ();
>> +      return {};
>> +    }
>> +
>> +  long max_insn_length = (gdbarch_max_insn_length_p (gdbarch) ?
>> +			  gdbarch_max_insn_length (gdbarch) : INT_MAX);
>> +  if (length <= 0 || length > max_insn_length)
>> +    {
>> +      PyErr_SetString (PyExc_ValueError, _("Invalid length attribute."));
>
> IWBN to help the user here, say why it is invalid: the length attribute
> value X exceeds the architecture's max insn length of Y.

Done.

>
>> +      gdbpy_print_stack ();
>> +      return {};
>> +    }
>> +
>> +  if (strlen (string.get ()) == 0)
>> +    {
>> +      PyErr_SetString (PyExc_ValueError, _("string attribute must not be empty."));
>> +      gdbpy_print_stack ();
>> +      return {};
>> +    }
>> +
>> +  /* Print the disassembled instruction back to core GDB, and return the
>> +     length of the disassembled instruction.  */
>> +  info->fprintf_func (info->stream, "%s", string.get ());
>> +  return gdb::optional<int> (length);
>> +}
>> +
>> +/* The tp_dealloc callback for the DisassemblerResult type.  Takes care of
>> +   deallocating the content buffer.  */
>> +
>> +static void
>> +disasmpy_dealloc_result (PyObject *self)
>> +{
>> +  disasm_result_object *obj = (disasm_result_object *) self;
>> +  delete obj->content;
>> +  Py_TYPE (self)->tp_free (self);
>> +}
>> +
>> +/* The get/set attributes of the gdb.disassembler.DisassembleInfo type.  */
>> +
>> +static gdb_PyGetSetDef disasm_info_object_getset[] = {
>> +  { "address", disasmpy_info_address, nullptr,
>> +    "Start address of the instruction to disassemble.", nullptr },
>> +  { "architecture", disasmpy_info_architecture, nullptr,
>> +    "Architecture to disassemble in", nullptr },
>> +  { "progspace", disasmpy_info_progspace, nullptr,
>> +    "Program space to disassemble in", nullptr },
>> +  { nullptr }   /* Sentinel */
>> +};
>> +
>> +/* The methods of the gdb.disassembler.DisassembleInfo type.  */
>> +
>> +static PyMethodDef disasm_info_object_methods[] = {
>> +  { "read_memory", (PyCFunction) disasmpy_info_read_memory,
>> +    METH_VARARGS | METH_KEYWORDS,
>> +    "read_memory (LEN, OFFSET = 0) -> Octets[]\n\
>> +Read LEN octets for the instruction to disassemble." },
>> +  { "is_valid", disasmpy_info_is_valid, METH_NOARGS,
>> +    "is_valid () -> Boolean.\n\
>> +Return true if this DisassembleInfo is valid, false if not." },
>> +  {nullptr}  /* Sentinel */
>> +};
>> +
>> +/* The get/set attributes of the gdb.disassembler.DisassemblerResult type.  */
>> +
>> +static gdb_PyGetSetDef disasm_result_object_getset[] = {
>> +  { "length", disasmpy_result_length, nullptr,
>> +    "Length of the disassembled instruction.", nullptr },
>> +  { "string", disasmpy_result_string, nullptr,
>> +    "String representing the disassembled instruction.", nullptr },
>> +  { nullptr }   /* Sentinel */
>> +};
>> +
>> +/* These are the methods we add into the _gdb.disassembler module, which
>> +   are then imported into the gdb.disassembler module.  These are global
>> +   functions that support performing disassembly.  */
>> +
>> +PyMethodDef python_disassembler_methods[] =
>> +{
>> +  { "builtin_disassemble", (PyCFunction) disasmpy_builtin_disassemble,
>> +    METH_VARARGS | METH_KEYWORDS,
>> +    "builtin_disassemble (INFO, MEMORY_SOURCE = None) -> None\n\
>> +Disassemble using GDB's builtin disassembler.  INFO is an instance of\n\
>> +gdb.disassembler.DisassembleInfo.  The MEMORY_SOURCE, if not None, should\n\
>> +be an object with the read_memory method." },
>> +  { "_set_enabled", (PyCFunction) disasmpy_set_enabled,
>> +    METH_VARARGS | METH_KEYWORDS,
>> +    "_set_enabled (STATE) -> None\n\
>> +Set whether GDB should call into the Python _print_insn code or not." },
>> +  {nullptr, nullptr, 0, nullptr}
>> +};
>> +
>> +/* Structure to define the _gdb.disassembler module.  */
>> +
>> +static struct PyModuleDef python_disassembler_module_def =
>> +{
>> +  PyModuleDef_HEAD_INIT,
>> +  "_gdb.disassembler",
>> +  nullptr,
>> +  -1,
>> +  python_disassembler_methods,
>> +  nullptr,
>> +  nullptr,
>> +  nullptr,
>> +  nullptr
>> +};
>> +
>> +/* Called to initialize the Python structures in this file.  */
>> +
>> +int
>> +gdbpy_initialize_disasm
>> +(void)
>
> Parenthesis on the previous line, and remove void?

What on earth was I thinking here??  Fixed.

I'm still adding some additional tests for the things I've updated in
this patch, but I wanted to reach out to get your feedback on the
memory-source question.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv5 0/5] Add Python API for the disassembler
  2022-04-25  9:15     ` [PATCHv4 0/5] Add Python API for the disassembler Andrew Burgess
                         ` (5 preceding siblings ...)
  2022-05-03 10:12       ` [PATCHv4 0/5] Add Python API for the disassembler Andrew Burgess
@ 2022-05-06 17:17       ` Andrew Burgess
  2022-05-06 17:17         ` [PATCHv5 1/5] gdb: add new base class to gdb_disassembler Andrew Burgess
                           ` (5 more replies)
  6 siblings, 6 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-05-06 17:17 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

Changes in v5:

  - Patch #1, minor typo fixes, and reword some comments in line with
    Simon's feedback.  Have not restructured the class hierarchy, this
    was mentioned in Simon's feedback, but he also said he'd accept
    what I have right now.  I think what I have right now does have
    some benefits, so I've stuck with that for now.

  - Patch #2, minor typo fixes based on Simon's feedback.

  - Patch #3, lots of significant changes.

    + Documentation has been updated and expanded significantly,

    + Added a new 'maint info python-disassemblers' command,

    + Removed the memory_source argument to the builtin_disassembler
      function, DisassembleInfo objects can now be sub-classed to
      achieve the same result,

    + Added additional test to catch more of the error cases, and
      updated the tests that related to the memory_source usage that
      has now been removed.

    + Plus all the minor style issues and typos that Simon pointed
      out.

Changes in v4:

  - Patch #1 from v3 series has been merged,

  - Addressed Eli's feedback on previous series,

  - Rebased onto current upstream/master.

Changes in v3:

  - Rebased to current master, and retested,

  - Patch #1 is new in this series,

  - Patch #2 is changed slightly from v2, I've reworked the
    disassembler classes in a slightly different way now, in order to
    prepare for patches #5 and #6.

  - Patch #3 is unchanged from v2,

  - Patch #4 is unchanged from v2,

  - Patch #5 is new in v3.  I've included it here as the changes in #2
    only make sense knowing that patch #5 is coming,

  - Patch #6 is a small cleanup only possible after #2 and #5 have landed.

Changes in v2:

  - The first 3 patches from the v1 series were merged a while back,
    these were all refactoring, or auxiliary features,

  - There's a new #1 patch in the v2 series that does some new
    refactoring of GDB's disassembler classes, this was required in
    order to simplify the #3 patch,

  - Patch #2 in the v2 series is largely unchanged from patch #4 in
    the v1 series,

  - The syntax highlighting work that was in the v1 series was spun
    out into its own patch, and has been merged separately,

  - The format_address helper function that appeared in the v1 series,
    and that Simon suggested I make more general, was spun out into
    its own patch, and merged separately,

  - Finally, patch #3 in the v2 series is pretty much a complete
    rewrite from the v1 series in order to follow the approach
    suggested by Simon.  Results are now returned directly, either via
    'return' or by raising an exception, in contrast to the original
    approach which involved "setting" the result into an existing
    state object.

---

Andrew Burgess (5):
  gdb: add new base class to gdb_disassembler
  gdb: add extension language print_insn hook
  gdb/python: implement the print_insn extension language hook
  gdb: refactor the non-printing disassemblers
  gdb: unify two dis_asm_read_memory functions in disasm.c

 gdb/Makefile.in                        |    1 +
 gdb/NEWS                               |   34 +
 gdb/arc-linux-tdep.c                   |   15 +-
 gdb/arc-tdep.c                         |   29 +-
 gdb/arc-tdep.h                         |    5 -
 gdb/arm-tdep.c                         |    4 +-
 gdb/data-directory/Makefile.in         |    1 +
 gdb/disasm-selftests.c                 |   70 +-
 gdb/disasm.c                           |  179 ++--
 gdb/disasm.h                           |  207 ++++-
 gdb/doc/gdb.texinfo                    |   41 +
 gdb/doc/python.texi                    |  292 +++++++
 gdb/extension-priv.h                   |   15 +
 gdb/extension.c                        |   20 +
 gdb/extension.h                        |   10 +
 gdb/guile/guile.c                      |    6 +-
 gdb/mips-tdep.c                        |    4 +-
 gdb/python/lib/gdb/disassembler.py     |  178 ++++
 gdb/python/py-disasm.c                 | 1057 ++++++++++++++++++++++++
 gdb/python/python-internal.h           |   16 +
 gdb/python/python.c                    |    3 +
 gdb/s12z-tdep.c                        |   26 +-
 gdb/testsuite/gdb.python/py-disasm.c   |   25 +
 gdb/testsuite/gdb.python/py-disasm.exp |  202 +++++
 gdb/testsuite/gdb.python/py-disasm.py  |  614 ++++++++++++++
 25 files changed, 2857 insertions(+), 197 deletions(-)
 create mode 100644 gdb/python/lib/gdb/disassembler.py
 create mode 100644 gdb/python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv5 1/5] gdb: add new base class to gdb_disassembler
  2022-05-06 17:17       ` [PATCHv5 " Andrew Burgess
@ 2022-05-06 17:17         ` Andrew Burgess
  2022-05-06 17:17         ` [PATCHv5 2/5] gdb: add extension language print_insn hook Andrew Burgess
                           ` (4 subsequent siblings)
  5 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-05-06 17:17 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

The motivation for this change is an upcoming Python disassembler API
that I would like to add.  As part of that change I need to create a
new disassembler like class that contains a disassemble_info and a
gdbarch.  The management of these two objects is identical to how we
manage these objects within gdb_disassembler, so it might be tempting
for my new class to inherit from gdb_disassembler.

The problem however, is that gdb_disassembler has a tight connection
between its constructor, and its print_insn method.  In the
constructor the ui_file* that is passed in is replaced with a member
variable string_file*, and then in print_insn, the contents of the
member variable string_file are printed to the original ui_file*.

What this means is that the gdb_disassembler class has a tight
coupling between its constructor and print_insn; the class just isn't
intended to be used in a situation where print_insn is not going to be
called, which is how my (upcoming) sub-class would need to operate.

My solution then, is to separate out the management of the
disassemble_info and gdbarch into a new gdb_disassemble_info class,
and make this class a parent of gdb_disassembler.

In arm-tdep.c and mips-tdep.c, where we used to cast the
disassemble_info->application_data to a gdb_disassembler, we can now
cast to a gdb_disassemble_info as we only need to access the gdbarch
information.

Now, my new Python disassembler sub-class will still want to print
things to an output stream, and so we will want access to the
dis_asm_fprintf functionality for printing.

However, rather than move this printing code into the
gdb_disassemble_info base class, I have added yet another level of
hierarchy, a gdb_printing_disassembler, thus the class structure is
now:

  struct gdb_disassemble_info {};
  struct gdb_printing_disassembler : public gdb_disassemble_info {};
  struct gdb_disassembler : public gdb_printing_disassembler {};

In a later commit my new Python disassembler will inherit from
gdb_printing_disassembler.

The reason for adding the additional layer to the class hierarchy is
that in yet another commit I intend to rewrite the function
gdb_buffered_insn_length, and to do this I will be creating yet more
disassembler like classes, however, these will not print anything,
thus I will add a gdb_non_printing_disassembler class that also
inherits from gdb_disassemble_info.  Knowing that that change is
coming, I've gone with the above class hierarchy now.

There should be no user visible changes after this commit.
---
 gdb/arm-tdep.c  |   4 +-
 gdb/disasm.c    |  58 +++++++++++++-------
 gdb/disasm.h    | 140 ++++++++++++++++++++++++++++++++++++++----------
 gdb/mips-tdep.c |   4 +-
 4 files changed, 154 insertions(+), 52 deletions(-)

diff --git a/gdb/arm-tdep.c b/gdb/arm-tdep.c
index aa5d8e6e6bd..c97e648677b 100644
--- a/gdb/arm-tdep.c
+++ b/gdb/arm-tdep.c
@@ -8228,8 +8228,8 @@ arm_displaced_step_fixup (struct gdbarch *gdbarch,
 static int
 gdb_print_insn_arm (bfd_vma memaddr, disassemble_info *info)
 {
-  gdb_disassembler *di
-    = static_cast<gdb_disassembler *>(info->application_data);
+  gdb_disassemble_info *di
+    = static_cast<gdb_disassemble_info *> (info->application_data);
   struct gdbarch *gdbarch = di->arch ();
 
   if (arm_pc_is_thumb (gdbarch, memaddr))
diff --git a/gdb/disasm.c b/gdb/disasm.c
index f2df5ef7bc5..6ac84388cc3 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -166,7 +166,8 @@ gdb_disassembler::dis_asm_print_address (bfd_vma addr,
 /* Format disassembler output to STREAM.  */
 
 int
-gdb_disassembler::dis_asm_fprintf (void *stream, const char *format, ...)
+gdb_printing_disassembler::fprintf_func (void *stream,
+					 const char *format, ...)
 {
   va_list args;
 
@@ -180,9 +181,9 @@ gdb_disassembler::dis_asm_fprintf (void *stream, const char *format, ...)
 /* See disasm.h.  */
 
 int
-gdb_disassembler::dis_asm_styled_fprintf (void *stream,
-					  enum disassembler_style style,
-					  const char *format, ...)
+gdb_printing_disassembler::fprintf_styled_func (void *stream,
+						enum disassembler_style style,
+						const char *format, ...)
 {
   va_list args;
 
@@ -797,26 +798,41 @@ get_all_disassembler_options (struct gdbarch *gdbarch)
 
 gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
 				    struct ui_file *file,
-				    di_read_memory_ftype read_memory_func)
-  : m_gdbarch (gdbarch),
+				    read_memory_ftype func)
+  : gdb_printing_disassembler (gdbarch, &m_buffer, func,
+			       dis_asm_memory_error, dis_asm_print_address),
     m_buffer (!use_ext_lang_colorization_p && disassembler_styling
 	      && file->can_emit_style_escape ()),
     m_dest (file)
+{ /* Nothing.  */ }
+
+/* See disasm.h.  */
+
+gdb_disassemble_info::gdb_disassemble_info
+  (struct gdbarch *gdbarch, struct ui_file *stream,
+   read_memory_ftype read_memory_func, memory_error_ftype memory_error_func,
+   print_address_ftype print_address_func, fprintf_ftype fprintf_func,
+   fprintf_styled_ftype fprintf_styled_func)
+    : m_gdbarch (gdbarch)
 {
-  init_disassemble_info (&m_di, &m_buffer, dis_asm_fprintf,
-			 dis_asm_styled_fprintf);
+  gdb_assert (fprintf_func != nullptr);
+  gdb_assert (fprintf_styled_func != nullptr);
+  init_disassemble_info (&m_di, stream, fprintf_func,
+			 fprintf_styled_func);
   m_di.flavour = bfd_target_unknown_flavour;
-  m_di.memory_error_func = dis_asm_memory_error;
-  m_di.print_address_func = dis_asm_print_address;
-  /* NOTE: cagney/2003-04-28: The original code, from the old Insight
-     disassembler had a local optimization here.  By default it would
-     access the executable file, instead of the target memory (there
-     was a growing list of exceptions though).  Unfortunately, the
-     heuristic was flawed.  Commands like "disassemble &variable"
-     didn't work as they relied on the access going to the target.
-     Further, it has been superseeded by trust-read-only-sections
-     (although that should be superseeded by target_trust..._p()).  */
-  m_di.read_memory_func = read_memory_func;
+
+  /* The memory_error_func, print_address_func, and read_memory_func are
+     all initialized to a default (non-nullptr) value by the call to
+     init_disassemble_info above.  If the user is overriding these fields
+     (by passing non-nullptr values) then do that now, otherwise, leave
+     these fields as the defaults.  */
+  if (memory_error_func != nullptr)
+    m_di.memory_error_func = memory_error_func;
+  if (print_address_func != nullptr)
+    m_di.print_address_func = print_address_func;
+  if (read_memory_func != nullptr)
+    m_di.read_memory_func = read_memory_func;
+
   m_di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
   m_di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
   m_di.endian = gdbarch_byte_order (gdbarch);
@@ -828,7 +844,9 @@ gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
   disassemble_init_for_target (&m_di);
 }
 
-gdb_disassembler::~gdb_disassembler ()
+/* See disasm.h.  */
+
+gdb_disassemble_info::~gdb_disassemble_info ()
 {
   disassemble_free_target (&m_di);
 }
diff --git a/gdb/disasm.h b/gdb/disasm.h
index 7efab7db46c..f31ca92b038 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -26,43 +26,137 @@ struct gdbarch;
 struct ui_out;
 struct ui_file;
 
-class gdb_disassembler
-{
-  using di_read_memory_ftype = decltype (disassemble_info::read_memory_func);
-
-public:
-  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
-    : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
-  {}
+/* A wrapper around a disassemble_info and a gdbarch.  This is the core
+   set of data that all disassembler sub-classes will need.  This class
+   doesn't actually implement the disassembling process, that is something
+   that sub-classes will do, with each sub-class doing things slightly
+   differently.
 
-  ~gdb_disassembler ();
+   The constructor of this class is protected, you should not create
+   instances of this class directly, instead create an instance of an
+   appropriate sub-class.  */
 
-  DISABLE_COPY_AND_ASSIGN (gdb_disassembler);
-
-  int print_insn (CORE_ADDR memaddr, int *branch_delay_insns = NULL);
+struct gdb_disassemble_info
+{
+  DISABLE_COPY_AND_ASSIGN (gdb_disassemble_info);
 
-  /* Return the gdbarch of gdb_disassembler.  */
+  /* Return the gdbarch we are disassembling for.  */
   struct gdbarch *arch ()
   { return m_gdbarch; }
 
+  /* Return a pointer to the disassemble_info, this will be needed for
+     passing into the libopcodes disassembler.  */
+  struct disassemble_info *disasm_info ()
+  { return &m_di; }
+
 protected:
-  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file,
-		    di_read_memory_ftype func);
 
+  /* Types for the function callbacks within m_di.  */
+  using read_memory_ftype = decltype (disassemble_info::read_memory_func);
+  using memory_error_ftype = decltype (disassemble_info::memory_error_func);
+  using print_address_ftype = decltype (disassemble_info::print_address_func);
+  using fprintf_ftype = decltype (disassemble_info::fprintf_func);
+  using fprintf_styled_ftype = decltype (disassemble_info::fprintf_styled_func);
+
+  /* Constructor, many fields in m_di are initialized from GDBARCH.  STREAM
+     is where the output of the disassembler will be written too, the
+     remaining arguments are function callbacks that are written into
+     m_di.  Of these function callbacks FPRINTF_FUNC and
+     FPRINTF_STYLED_FUNC must not be nullptr.  If READ_MEMORY_FUNC,
+     MEMORY_ERROR_FUNC, or PRINT_ADDRESS_FUNC are nullptr, then that field
+     within m_di is left with its default value (see the libopcodes
+     function init_disassemble_info for the defaults).  */
+  gdb_disassemble_info (struct gdbarch *gdbarch,
+			struct ui_file *stream,
+			read_memory_ftype read_memory_func,
+			memory_error_ftype memory_error_func,
+			print_address_ftype print_address_func,
+			fprintf_ftype fprintf_func,
+			fprintf_styled_ftype fprintf_styled_func);
+
+  /* Destructor.  */
+  virtual ~gdb_disassemble_info ();
+
+  /* The stream that disassembler output is being written too.  */
   struct ui_file *stream ()
   { return (struct ui_file *) m_di.stream; }
 
-private:
-  struct gdbarch *m_gdbarch;
-
   /* Stores data required for disassembling instructions in
      opcodes.  */
   struct disassemble_info m_di;
 
+private:
+  /* The architecture we are disassembling for.  */
+  struct gdbarch *m_gdbarch;
+
   /* If we own the string in `m_di.disassembler_options', we do so
      using this field.  */
   std::string m_disassembler_options_holder;
+};
+
+/* A wrapper around gdb_disassemble_info.  This class adds default
+   print functions that are supplied to the disassemble_info within the
+   parent class.  These default print functions write to the stream, which
+   is also contained in the parent class.
+
+   As with the parent class, the constructor for this class is protected,
+   you should not create instances of this class, but create an
+   appropriate sub-class instead.  */
 
+struct gdb_printing_disassembler : public gdb_disassemble_info
+{
+  DISABLE_COPY_AND_ASSIGN (gdb_printing_disassembler);
+
+protected:
+
+  /* Constructor.  All the arguments are just passed to the parent class.
+     We also add the two print functions to the arguments passed to the
+     parent.  See gdb_disassemble_info for a description of how the
+     arguments are handled.  */
+  gdb_printing_disassembler (struct gdbarch *gdbarch,
+			     struct ui_file *stream,
+			     read_memory_ftype read_memory_func,
+			     memory_error_ftype memory_error_func,
+			     print_address_ftype print_address_func)
+    : gdb_disassemble_info (gdbarch, stream, read_memory_func,
+			    memory_error_func, print_address_func,
+			    fprintf_func, fprintf_styled_func)
+  { /* Nothing.  */ }
+
+  /* Callback used as the disassemble_info's fprintf_func callback, this
+     writes to STREAM, which will be m_di.stream.  */
+  static int fprintf_func (void *stream, const char *format, ...)
+    ATTRIBUTE_PRINTF(2,3);
+
+  /* Callback used as the disassemble_info's fprintf_styled_func callback,
+     this writes to STREAM, which will be m_di.stream.  */
+  static int fprintf_styled_func (void *stream,
+				  enum disassembler_style style,
+				  const char *format, ...)
+    ATTRIBUTE_PRINTF(3,4);
+};
+
+/* A dissassembler class that provides 'print_insn', a method for
+   disassembling a single instruction to the output stream.  */
+
+struct gdb_disassembler : public gdb_printing_disassembler
+{
+  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
+    : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
+  { /* Nothing.  */ }
+
+  DISABLE_COPY_AND_ASSIGN (gdb_disassembler);
+
+  /* Disassemble a single instruction at MEMADDR to the ui_file* that was
+     passed to the constructor.  If a memory error occurs while
+     disassembling this instruction then an error will be thrown.  */
+  int print_insn (CORE_ADDR memaddr, int *branch_delay_insns = NULL);
+
+protected:
+  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file,
+		    read_memory_ftype func);
+
+private:
   /* This member variable is given a value by calling dis_asm_memory_error.
      If after calling into the libopcodes disassembler we get back a
      negative value (which indicates an error), then, if this variable has
@@ -95,16 +189,6 @@ class gdb_disassembler
      (currently just to addresses and symbols) as it goes.  */
   static bool use_ext_lang_colorization_p;
 
-  static int dis_asm_fprintf (void *stream, const char *format, ...)
-    ATTRIBUTE_PRINTF(2,3);
-
-  /* Print formatted message to STREAM, the content can be styled based on
-     STYLE if desired.  */
-  static int dis_asm_styled_fprintf (void *stream,
-				     enum disassembler_style style,
-				     const char *format, ...)
-    ATTRIBUTE_PRINTF(3,4);
-
   static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
 				  unsigned int len,
 				  struct disassemble_info *info);
diff --git a/gdb/mips-tdep.c b/gdb/mips-tdep.c
index ffed8723dce..7684ae638e5 100644
--- a/gdb/mips-tdep.c
+++ b/gdb/mips-tdep.c
@@ -7018,8 +7018,8 @@ reinit_frame_cache_sfunc (const char *args, int from_tty,
 static int
 gdb_print_insn_mips (bfd_vma memaddr, struct disassemble_info *info)
 {
-  gdb_disassembler *di
-    = static_cast<gdb_disassembler *>(info->application_data);
+  gdb_disassemble_info *di
+    = static_cast<gdb_disassemble_info *> (info->application_data);
   struct gdbarch *gdbarch = di->arch ();
 
   /* FIXME: cagney/2003-06-26: Is this even necessary?  The
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv5 2/5] gdb: add extension language print_insn hook
  2022-05-06 17:17       ` [PATCHv5 " Andrew Burgess
  2022-05-06 17:17         ` [PATCHv5 1/5] gdb: add new base class to gdb_disassembler Andrew Burgess
@ 2022-05-06 17:17         ` Andrew Burgess
  2022-05-06 17:17         ` [PATCHv5 3/5] gdb/python: implement the print_insn extension language hook Andrew Burgess
                           ` (3 subsequent siblings)
  5 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-05-06 17:17 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

This commit is setup for the next commit.

In the next commit I will add a Python API to intercept the print_insn
calls within GDB, each print_insn call is responsible for
disassembling, and printing one instruction.  After the next commit it
will be possible for a user to write Python code that either wraps
around the existing disassembler, or even, in extreme situations,
entirely replaces the existing disassembler.

This commit does not add any new Python API.

What this commit does is put the extension language framework in place
for a print_insn hook.  There's a new callback added to 'struct
extension_language_ops', which is then filled in with nullptr for Python
and Guile.

Finally, in the disassembler, the code is restructured so that the new
extension language function ext_lang_print_insn is called before we
delegate to gdbarch_print_insn.

After this, the next commit can focus entirely on providing a Python
implementation of the new print_insn callback.

There should be no user visible change after this commit.
---
 gdb/disasm.c         | 29 ++++++++++++++++++++++++++---
 gdb/extension-priv.h | 15 +++++++++++++++
 gdb/extension.c      | 20 ++++++++++++++++++++
 gdb/extension.h      | 10 ++++++++++
 gdb/guile/guile.c    |  6 +++++-
 gdb/python/python.c  |  2 ++
 6 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/gdb/disasm.c b/gdb/disasm.c
index 6ac84388cc3..4af40c916b2 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -851,6 +851,29 @@ gdb_disassemble_info::~gdb_disassemble_info ()
   disassemble_free_target (&m_di);
 }
 
+/* Wrapper around calling gdbarch_print_insn.  This function takes care of
+   first calling the extension language hooks for print_insn, and, if none
+   of the extension languages can print this instruction, calls
+   gdbarch_print_insn to do the work.
+
+   GDBARCH is the architecture to disassemble in, VMA is the address of the
+   instruction being disassembled, and INFO is the libopcodes disassembler
+   related information.  */
+
+static int
+gdb_print_insn_1 (struct gdbarch *gdbarch, CORE_ADDR vma,
+		  struct disassemble_info *info)
+{
+  /* Call into the extension languages to do the disassembly.  */
+  gdb::optional<int> length = ext_lang_print_insn (gdbarch, vma, info);
+  if (length.has_value ())
+    return *length;
+
+  /* No extension language wanted to do the disassembly, so do it
+     manually.  */
+  return gdbarch_print_insn (gdbarch, vma, info);
+}
+
 /* See disasm.h.  */
 
 bool gdb_disassembler::use_ext_lang_colorization_p = true;
@@ -864,7 +887,7 @@ gdb_disassembler::print_insn (CORE_ADDR memaddr,
   m_err_memaddr.reset ();
   m_buffer.clear ();
 
-  int length = gdbarch_print_insn (arch (), memaddr, &m_di);
+  int length = gdb_print_insn_1 (arch (), memaddr, &m_di);
 
   /* If we have successfully disassembled an instruction, styling is on, we
      think that the extension language might be able to perform styling for
@@ -899,7 +922,7 @@ gdb_disassembler::print_insn (CORE_ADDR memaddr,
 	  gdb_assert (!m_buffer.term_out ());
 	  m_buffer.~string_file ();
 	  new (&m_buffer) string_file (true);
-	  length = gdbarch_print_insn (arch (), memaddr, &m_di);
+	  length = gdb_print_insn_1 (arch (), memaddr, &m_di);
 	  gdb_assert (length > 0);
 	}
     }
@@ -1054,7 +1077,7 @@ gdb_buffered_insn_length (struct gdbarch *gdbarch,
   gdb_buffered_insn_length_init_dis (gdbarch, &di, insn, max_len, addr,
 				     &disassembler_options_holder);
 
-  int result = gdbarch_print_insn (gdbarch, addr, &di);
+  int result = gdb_print_insn_1 (gdbarch, addr, &di);
   disassemble_free_target (&di);
   return result;
 }
diff --git a/gdb/extension-priv.h b/gdb/extension-priv.h
index d9450b51231..7c74e721c57 100644
--- a/gdb/extension-priv.h
+++ b/gdb/extension-priv.h
@@ -263,6 +263,21 @@ struct extension_language_ops
      contents, or an empty optional.  */
   gdb::optional<std::string> (*colorize_disasm) (const std::string &content,
 						 gdbarch *gdbarch);
+
+  /* Print a single instruction from ADDRESS in architecture GDBARCH.  INFO
+     is the standard libopcodes disassembler_info structure.  Bytes for the
+     instruction being printed should be read using INFO->read_memory_func
+     as the actual instruction bytes might be in a buffer.
+
+     Use INFO->fprintf_func to print the results of the disassembly, and
+     return the length of the instruction.
+
+     If no instruction can be disassembled then return an empty value and
+     other extension languages will get a chance to perform the
+     disassembly.  */
+  gdb::optional<int> (*print_insn) (struct gdbarch *gdbarch,
+				    CORE_ADDR address,
+				    struct disassemble_info *info);
 };
 
 /* State necessary to restore a signal handler to its previous value.  */
diff --git a/gdb/extension.c b/gdb/extension.c
index 8f39b86e952..5a805bea00e 100644
--- a/gdb/extension.c
+++ b/gdb/extension.c
@@ -924,6 +924,26 @@ ext_lang_colorize_disasm (const std::string &content, gdbarch *gdbarch)
   return result;
 }
 
+/* See extension.h.  */
+
+gdb::optional<int>
+ext_lang_print_insn (struct gdbarch *gdbarch, CORE_ADDR address,
+		     struct disassemble_info *info)
+{
+  for (const struct extension_language_defn *extlang : extension_languages)
+    {
+      if (extlang->ops == nullptr
+	  || extlang->ops->print_insn == nullptr)
+	continue;
+      gdb::optional<int> length
+	= extlang->ops->print_insn (gdbarch, address, info);
+      if (length.has_value ())
+	return length;
+    }
+
+  return {};
+}
+
 /* Called via an observer before gdb prints its prompt.
    Iterate over the extension languages giving them a chance to
    change the prompt.  The first one to change the prompt wins,
diff --git a/gdb/extension.h b/gdb/extension.h
index 7eb89530c44..47839ea50be 100644
--- a/gdb/extension.h
+++ b/gdb/extension.h
@@ -327,6 +327,16 @@ extern gdb::optional<std::string> ext_lang_colorize
 extern gdb::optional<std::string> ext_lang_colorize_disasm
   (const std::string &content, gdbarch *gdbarch);
 
+/* Calls extension_language_ops::print_insn for each extension language,
+   returning the result from the first extension language that returns a
+   non-empty result (any further extension languages are not then called).
+
+   All arguments are forwarded to extension_language_ops::print_insn, see
+   that function for a full description.  */
+
+extern gdb::optional<int> ext_lang_print_insn
+  (struct gdbarch *gdbarch, CORE_ADDR address, struct disassemble_info *info);
+
 #if GDB_SELF_TEST
 namespace selftests {
 extern void (*hook_set_active_ext_lang) ();
diff --git a/gdb/guile/guile.c b/gdb/guile/guile.c
index c7be48fb739..14b191ded62 100644
--- a/gdb/guile/guile.c
+++ b/gdb/guile/guile.c
@@ -130,8 +130,12 @@ static const struct extension_language_ops guile_extension_ops =
   gdbscm_breakpoint_has_cond,
   gdbscm_breakpoint_cond_says_stop,
 
-  NULL, /* gdbscm_check_quit_flag, */
   NULL, /* gdbscm_set_quit_flag, */
+  NULL, /* gdbscm_check_quit_flag, */
+  NULL, /* gdbscm_before_prompt, */
+  NULL, /* gdbscm_get_matching_xmethod_workers */
+  NULL, /* gdbscm_colorize */
+  NULL, /* gdbscm_print_insn */
 };
 #endif
 
diff --git a/gdb/python/python.c b/gdb/python/python.c
index 11aaa7ae778..b5b8379e23c 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -166,6 +166,8 @@ static const struct extension_language_ops python_extension_ops =
   gdbpy_colorize,
 
   gdbpy_colorize_disasm,
+
+  NULL, /* gdbpy_print_insn, */
 };
 
 #endif /* HAVE_PYTHON */
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv5 3/5] gdb/python: implement the print_insn extension language hook
  2022-05-06 17:17       ` [PATCHv5 " Andrew Burgess
  2022-05-06 17:17         ` [PATCHv5 1/5] gdb: add new base class to gdb_disassembler Andrew Burgess
  2022-05-06 17:17         ` [PATCHv5 2/5] gdb: add extension language print_insn hook Andrew Burgess
@ 2022-05-06 17:17         ` Andrew Burgess
  2022-05-06 18:11           ` Eli Zaretskii
  2022-05-06 17:17         ` [PATCHv5 4/5] gdb: refactor the non-printing disassemblers Andrew Burgess
                           ` (2 subsequent siblings)
  5 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2022-05-06 17:17 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

This commit extends the Python API to include disassembler support.

The motivation for this commit was to provide an API by which the user
could write Python scripts that would augment the output of the
disassembler.

To achieve this I have followed the model of the existing libopcodes
disassembler, that is, instructions are disassembled one by one.  This
does restrict the type of things that it is possible to do from a
Python script, i.e. all additional output has to fit on a single line,
but this was all I needed, and creating something more complex would,
I think, require greater changes to how GDB's internal disassembler
operates.

The disassembler API is contained in the new gdb.disassembler module,
which defines the following classes:

  DisassembleInfo

      Similar to libopcodes disassemble_info structure, has read-only
  properties: address, architecture, and progspace.  And has methods:
  __init__, read_memory, and is_valid.

      Each time GDB wants an instruction disassembled, an instance of
  this class is passed to a user written disassembler function, by
  reading the properties, and calling the methods (and other support
  methods in the gdb.disassembler module) the user can perform and
  return the disassembly.

  Disassembler

      This is a base-class which user written disassemblers should
  inherit from.  This base class provides base implementations of
  __init__ and __call__ which the user written disassembler should
  override.

  DisassemblerResult

      This class can be used to hold the result of a call to the
  disassembler, it's really just a wrapper around a string (the text
  of the disassembled instruction) and a length (in bytes).  The user
  can return an instance of this class from Disassembler.__call__ to
  represent the newly disassembled instruction.

The gdb.disassembler module also provides the following functions:

  register_disassembler

      This function registers an instance of a Disassembler sub-class
  as a disassembler, either for one specific architecture, or, as a
  global disassembler for all architectures.

  builtin_disassemble

      This provides access to GDB's builtin disassembler.  A common
  use case that I see is augmenting the existing disassembler output.
  The user code can call this function to have GDB disassemble the
  instruction in the normal way.  The user gets back a
  DisassemblerResult object, which they can then read in order to
  augment the disassembler output in any way they wish.

      This function also provides a mechanism to intercept the
  disassemblers reads of memory, thus the user can adjust what GDB
  sees when it is disassembling.

The included documentation provides a more detailed description of the
API.

There is also a new CLI command added:

  maint info python-disassemblers

This command is defined in the Python gdb.disassemblers module, and
can be used to list the currently registered Python disassemblers.
---
 gdb/Makefile.in                        |    1 +
 gdb/NEWS                               |   34 +
 gdb/data-directory/Makefile.in         |    1 +
 gdb/doc/gdb.texinfo                    |   41 +
 gdb/doc/python.texi                    |  292 +++++++
 gdb/python/lib/gdb/disassembler.py     |  178 ++++
 gdb/python/py-disasm.c                 | 1057 ++++++++++++++++++++++++
 gdb/python/python-internal.h           |   16 +
 gdb/python/python.c                    |    3 +-
 gdb/testsuite/gdb.python/py-disasm.c   |   25 +
 gdb/testsuite/gdb.python/py-disasm.exp |  202 +++++
 gdb/testsuite/gdb.python/py-disasm.py  |  614 ++++++++++++++
 12 files changed, 2463 insertions(+), 1 deletion(-)
 create mode 100644 gdb/python/lib/gdb/disassembler.py
 create mode 100644 gdb/python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

diff --git a/gdb/Makefile.in b/gdb/Makefile.in
index 418094775a5..42a0ebb371b 100644
--- a/gdb/Makefile.in
+++ b/gdb/Makefile.in
@@ -393,6 +393,7 @@ SUBDIR_PYTHON_SRCS = \
 	python/py-cmd.c \
 	python/py-connection.c \
 	python/py-continueevent.c \
+	python/py-disasm.c \
 	python/py-event.c \
 	python/py-evtregistry.c \
 	python/py-evts.c \
diff --git a/gdb/NEWS b/gdb/NEWS
index 982f4a1a18c..ddbaff51f89 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -38,6 +38,40 @@ maintenance info line-table
      This is the same format that GDB uses when printing address, symbol,
      and offset information from the disassembler.
 
+  ** New Python API for wrapping GDB's disassembler:
+
+     - gdb.disassembler.register_disassembler(DISASSEMBLER, ARCH).
+       DISASSEMBLER is a sub-class of gdb.disassembler.Disassembler.
+       ARCH is either None or a string containing a bfd architecture
+       name.  DISASSEMBLER is registered as a disassembler for
+       architecture ARCH, or for all architectures if ARCH is None.
+       The previous disassembler registered for ARCH is returned, this
+       can be None if no previous disassembler was registered.
+
+     - gdb.disassembler.Disassembler is the class from which all
+       disassemblers should inherit.  Its constructor takes a string,
+       a name for the disassembler, which is currently only used in
+       some debug output.  Sub-classes should override the __call__
+       method to perform disassembly, invoking __call__ on this base
+       class will raise an exception.
+
+     - gdb.disassembler.DisassembleInfo is the class used to describe
+       a single disassembly request from GDB.  An instance of this
+       class is passed to the __call__ method of
+       gdb.disassembler.Disassembler and has the following read-only
+       attributes: 'address', and 'architecture', as well as the
+       following method: 'read_memory'.
+
+     - gdb.disassembler.builtin_disassemble(INFO, MEMORY_SOURCE),
+       calls GDB's builtin disassembler on INFO, which is a
+       gdb.disassembler.DisassembleInfo object.  MEMORY_SOURCE is
+       optional, its default value is None.  If MEMORY_SOURCE is not
+       None then it must be an object that has a 'read_memory' method.
+
+     - gdb.disassembler.DisassemblerResult is a class that can be used
+       to wrap the result of a call to a Disassembler.  It has
+       read-only attributes 'length' and 'string'.
+
 *** Changes in GDB 12
 
 * DBX mode is deprecated, and will be removed in GDB 13
diff --git a/gdb/data-directory/Makefile.in b/gdb/data-directory/Makefile.in
index b606fc654b5..cf5226f3961 100644
--- a/gdb/data-directory/Makefile.in
+++ b/gdb/data-directory/Makefile.in
@@ -69,6 +69,7 @@ PYTHON_DIR = python
 PYTHON_INSTALL_DIR = $(DESTDIR)$(GDB_DATADIR)/$(PYTHON_DIR)
 PYTHON_FILE_LIST = \
 	gdb/__init__.py \
+	gdb/disassembler.py \
 	gdb/FrameDecorator.py \
 	gdb/FrameIterator.py \
 	gdb/frames.py \
diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
index 38ad2ac32b0..dd2fe1acf0d 100644
--- a/gdb/doc/gdb.texinfo
+++ b/gdb/doc/gdb.texinfo
@@ -39529,6 +39529,47 @@
 @item maint info jit
 Print information about JIT code objects loaded in the current inferior.
 
+@anchor{maint info python-disassemblers}
+@kindex maint info python-disassemblers
+@item maint info python-disassemblers
+This command is defined within the @code{gdb.disassembler} Python
+module (@pxref{Disassembly In Python}), and will only be present after
+that module has been imported.  To force the module to be imported do
+the following:
+
+@smallexample
+(@value{GDBP}) python import gdb.disassembler
+@end smallexample
+
+This command lists all the architectures for which a disassembler is
+currently registered, and the name of the disassembler.  If a
+disassembler is registered for all architectures, then this is listed
+last against the @samp{GLOBAL} architecture.
+
+If one of the disassemblers would be selected for the architecture of
+the current inferior, then this disassembler will be marked.
+
+The following example shows a situation in which two disassemblers are
+registered, initially the @samp{i386} disassembler matches the current
+architecture, then the architecture is changed, now the @samp{GLOBAL}
+disassembler matches.
+
+@smallexample
+(@value{GDBP}) show architecture
+The target architecture is set to "auto" (currently "i386").
+(@value{GDBP}) maint info python-disassemblers
+Architecture        Disassember Name
+i386                Disassembler_1	(Matches current architecture)
+GLOBAL              Disassembler_2
+(@value{GDBP}) set architecture arm
+The target architecture is set to "arm".
+(@value{GDBP}) maint info python-disassemblers
+quit
+Architecture        Disassember Name
+i386                Disassembler_1
+GLOBAL              Disassembler_2	(Matches current architecture)
+@end smallexample
+
 @kindex set displaced-stepping
 @kindex show displaced-stepping
 @cindex displaced stepping support
diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
index cb5283e03c0..41612a6411d 100644
--- a/gdb/doc/python.texi
+++ b/gdb/doc/python.texi
@@ -222,6 +222,7 @@
 * Registers In Python::         Python representation of registers.
 * Connections In Python::	Python representation of connections.
 * TUI Windows In Python::       Implementing new TUI windows.
+* Disassembly In Python::       Instruction Disassembly In Python
 @end menu
 
 @node Basic Python
@@ -598,6 +599,7 @@
 related prompts are prohibited from being changed.
 @end defun
 
+@anchor{gdb_architecture_names}
 @defun gdb.architecture_names ()
 Return a list containing all of the architecture names that the
 current build of @value{GDBN} supports.  Each architecture name is a
@@ -3278,6 +3280,7 @@
 particular frame (@pxref{Frames In Python}).
 @end defun
 
+@anchor{gdbpy_inferior_read_memory}
 @findex Inferior.read_memory
 @defun Inferior.read_memory (address, length)
 Read @var{length} addressable memory units from the inferior, starting at
@@ -6562,6 +6565,295 @@
 values can be 1 (left), 2 (middle), or 3 (right).
 @end defun
 
+@node Disassembly In Python
+@cindex python instruction disassembly
+@subsubsection Instruction Disassembly In Python
+
+@value{GDBN}'s builtin disassembler can be extended, or even replaced,
+using the Python API.  The disassembler related features are contained
+within the @code{gdb.disassembler} module:
+
+@deftp {class} gdb.disassembler.DisassembleInfo
+Disassembly is driven by instances of this class.  Each time
+@value{GDBN} needs to disassemble an instruction, an instance of this
+class is created and passed to a registered disassembler.  The
+disassembler is then responsible for disassembling an instruction and
+returning a result.
+
+Instances of this type are usually created within @value{GDBN},
+however, it is possible to create a copy of an instance of this type,
+see the description of @code{__init__} for more details.
+
+This class has the following properties and methods:
+
+@defvar DisassembleInfo.address
+A read-only integer containing the address at which @value{GDBN}
+wishes to disassemble a single instruction.
+@end defvar
+
+@defvar DisassembleInfo.architecture
+The @code{gdb.Architecture} (@pxref{Architectures In Python}) for
+which @value{GDBN} is currently disassembling, this property is
+read-only.
+@end defvar
+
+@defvar DisassembleInfo.progspace
+The @code{gdb.Progspace} (@pxref{Progspaces In Python,,Program Spaces
+In Python}) for which @value{GDBN} is currently disassembling, this
+property is read-only.
+@end defvar
+
+@defun DisassembleInfo.read_memory (length, offset)
+This method allows the disassembler to read the bytes of the
+instruction to be disassembled.  The method reads @var{length} bytes,
+starting at @var{offset} from
+@code{DisassembleInfo.address}.
+
+It is important that the disassembler read the instruction bytes using
+this method, rather than reading inferior memory directly, as in some
+cases @value{GDBN} disassembles from an internal buffer rather than
+directly from inferior memory, calling this method handles this
+detail.
+
+Returns a buffer object, which behaves much like an array or a string,
+just as @code{Inferior.read_memory} does
+(@pxref{gdbpy_inferior_read_memory,,Inferior.read_memory}).  The
+length of the returned buffer will always be exactly @var{length}.
+
+If @value{GDBN} is unable to read the required memory then a
+@code{gdb.MemoryError} exception is raised (@pxref{Exception
+Handling}), raising any other exception type from this method is an
+error.
+
+While disassembling a single instruction there could be multiple calls
+to this method, and the same bytes might be read multiple times.  Any
+single call might only read a subset of the total instruction bytes.
+
+Consider, for example, and architecture with 2-byte and 4-byte
+instructions, the disassembler might first read 2-bytes from memory in
+order to establish if the instruction is 2 or 4 bytes long.  If the
+instruction is 4-bytes long then the disassembler might then read the
+remaining 2 bytes, or might read the entire 4 bytes again.  The memory
+reading behaviour of the disassembler on different architectures could
+be different.
+@end defun
+
+@defun DisassembleInfo.is_valid ()
+Returns @code{True} if the @code{DisassembleInfo} object is valid,
+@code{False} if not.  A @code{DisassembleInfo} object will become
+invalid once the disassembly call for which the @code{DisassembleInfo}
+was created, has returned.  Calling other @code{DisassembleInfo}
+methods, or accessing @code{DisassembleInfo} properties, will raise a
+@code{RuntimeError} exception if it is invalid.
+@end defun
+
+@defun DisassembleInfo.__init__ (info)
+This can be used to create a new @code{DisassembleInfo} object that is
+a copy of @var{info}.  The copy will have the same @code{address},
+@code{architecture}, and @code{progspace} values as @var{info}, and
+will become invalid at the same time as @var{info}.
+
+This method exists so that sub-classes of @code{DisassembleInfo} can
+be created, these sub-classes must be initialized as copies of an
+existing @code{DisassembleInfo} object, but sub-classes might choose
+to override the @code{read_memory} method, and so control what
+@value{GDBN} sees when reading from memory
+(@pxref{builtin_disassemble}).
+
+@end defun
+
+@end deftp
+
+@deftp {class} Disassembler
+This is a base class from which all user implemented disassemblers
+must inherit.
+
+@defun Disassembler.__init__ (name)
+The constructor takes @var{name}, a string, which should be a short
+name for this disassembler.
+@end defun
+
+@defun Disassembler.__call__ (info)
+The @code{__call__} method must be overridden by sub-classes to
+perform disassembly.  Calling @code{__call__} on this base class will
+raise a @code{NotImplementedError} exception.
+
+The @var{info} argument is an instance of @code{DisassembleInfo}, and
+describes the instruction that @value{GDBN} wants disassembling.
+
+If this function returns @code{None}, this indicates to @value{GDBN}
+that this sub-class doesn't wish to disassemble the requested
+instruction.  @value{GDBN} will then use its builtin disassembler to
+perform the disassembly.
+
+Alternatively, this function can return a @code{DisassemblerResult}
+that represents the disassembled instruction, this type is described
+in more detail below.
+
+The @code{__call__} method can raise a @code{gdb.MemoryError}
+exception (@pxref{Exception Handling}) to indicate to @value{GDBN}
+that there was a problem accessing the required memory, this will then
+be displayed by @value{GDBN} within the disassembler output.
+
+Any other exception type raised by the @code{__call__} method is an
+error, @value{GDBN} will display the error and then use its builtin
+disassembler to disassemble the instruction instead.
+@end defun
+@end deftp
+
+@deftp {class} DisassemblerResult
+This class is used to hold the result of calling
+@w{@code{Disassembler.__call__}}, and represents a single disassembled
+instruction.  This class has the following properties and methods:
+
+@defun DisassemblerResult.__init__ (@var{length}, @var{string})
+Initialise an instance of this class, @var{length} is the length of
+the disassembled instruction in bytes, which must be greater than
+zero, and @var{string} is a non-empty string that represents the
+disassembled instruction.
+@end defun
+
+@defvar DisassemblerResult.length
+A read-only property containing the length of the disassembled
+instruction in bytes, this will always be greater than zero.
+@end defvar
+
+@defvar DisassemblerResult.string
+A read-only property containing a non-empty string representing the
+disassembled instruction.
+@end defvar
+@end deftp
+
+The following functions are also contained in the
+@code{gdb.disassembler} module:
+
+@defun register_disassembler (disassembler, architecture)
+The @var{disassembler} must be a sub-class of
+@code{gdb.disassembler.Disassembler} or @code{None}.
+
+The optional @var{architecture} is either a string, or the value
+@code{None}.  If it is a string, then it should be the name of an
+architecture known to @value{GDBN}, as returned either from
+@code{gdb.Architecture.name}
+(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from
+@code{gdb.architecture_names}
+(@pxref{gdb_architecture_names,,gdb.architecture_names}).
+
+The @var{disassembler} will be installed for the architecture named by
+@var{architecture}, or if @var{architecture} is @code{None}, then
+@var{disassembler} will be installed as a global disassembler for use
+by all architectures.
+
+@cindex disassembler in Python, global vs.@: specific
+@cindex search order for disassembler in Python
+@cindex look up of disassembler in Python
+@value{GDBN} only records a single disassembler for each architecture,
+and a single global disassembler.  Calling
+@code{register_disassembler} for an architecture, or for the global
+disassembler, will replace any existing disassembler registered for
+that @var{architecture} value.  The previous disassembler is returned.
+
+If @var{disassembler} is @code{None} then any disassembler currently
+registered for @var{architecture} is removed, the previously
+registered disassembler is still returned.
+
+When @value{GDBN} is looking for a disassembler to use, @value{GDBN}
+first looks for an architecture specific disassembler.  If none has
+been registered then @value{GDBN} looks for a global disassembler (one
+registered with @var{architecture} set to @code{None}).  Only one
+disassembler is called to perform disassembly, so, if there is both an
+architecture specific disassembler, and a global disassembler
+registered, it is the architecture specific disassembler that will be
+used.
+
+@value{GDBN} tracks the architecture specific, and global
+disassemblers separately, so it doesn't matter in which order
+disassemblers are created or registered; an architecture specific
+disassembler, if present, will always be used in preference to a
+global disassembler.
+
+To see which disassemblers have been registered the @kbd{maint info
+python-disassemblers} command can be used (@pxref{maint info
+python-disassemblers}).
+@end defun
+
+@anchor{builtin_disassemble}
+@defun builtin_disassemble (info)
+This function calls back into @value{GDBN}'s builtin disassembler to
+disassemble the instruction identified by @var{info}, an instance, or
+sub-class, of @code{DisassembleInfo}
+
+If the builtin disassembler is successful then an instance of
+@code{DisassemblerResult} is returned.
+
+If the builtin disassembler fails then a @code{gdb.MemoryError}
+exception will be raised.
+
+When the builtin disassembler needs to read memory the
+@code{read_memory} method on @var{info} will be called, by
+sub-classing @code{DisassembleInfo} and overriding the
+@code{read_memory} method, it is possible to intercept calls to
+@code{read_memory} by the builtin disassembler, and to modify the
+values returned.
+
+It is important to understand that, even when
+@code{DisassembleInfo.read_memory} raises a @code{gdb.MemoryError}, it
+is the internal disassembler itself that reports the memory error to
+@value{GDBN}.  The reason for this is that the disassembler might
+probe memory to see if a byte is readable or not; if the byte can't be
+read then the disassembler may choose not to report an error, but
+instead to disassemble the bytes that it does have available.
+@end defun
+
+Here is an example that registers a global disassembler.  The new
+disassembler invokes the builtin disassembler, and then adds a
+comment, @code{## Comment}, to each line of disassembly output:
+
+@smallexample
+class ExampleDisassembler(gdb.disassembler.Disassembler):
+    def __init__(self):
+        super().__init__("ExampleDisassembler")
+
+    def __call__(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        length = result.length
+        text = result.string + "\t## Comment"
+        return gdb.disassembler.DisassemblerResult(length, text)
+
+gdb.disassembler.register_disassembler(ExampleDisassembler())
+@end smallexample
+
+The following example creates a sub-class of @code{DisassembleInfo} in
+order to intercept the @code{read_memory} calls, within
+@code{read_memory} any bytes read from memory have the two 4-bit
+nibbles swapped around.  This isn't a very useful adjustment, but
+serves as an example.
+
+@smallexample
+class MyInfo(gdb.disassembler.DisassembleInfo):
+    def __init__(self, info):
+        super().__init__(info)
+
+    def read_memory(self, length, offset):
+        buffer = super().read_memory(length, offset)
+        result = bytearray()
+        for b in buffer:
+            v = int.from_bytes(b, 'little')
+            v = (v << 4) & 0xf0 | (v >> 4)
+            result.append(v)
+        return memoryview(result)
+
+class NibbleSwapDisassembler(gdb.disassembler.Disassembler):
+    def __init__(self):
+        super().__init__("NibbleSwapDisassembler")
+
+    def __call__(self, info):
+        info = MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+gdb.disassembler.register_disassembler(NibbleSwapDisassembler())
+@end smallexample
+
 @node Python Auto-loading
 @subsection Python Auto-loading
 @cindex Python auto-loading
diff --git a/gdb/python/lib/gdb/disassembler.py b/gdb/python/lib/gdb/disassembler.py
new file mode 100644
index 00000000000..5a2d94a5fac
--- /dev/null
+++ b/gdb/python/lib/gdb/disassembler.py
@@ -0,0 +1,178 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+"""Disassembler related module."""
+
+import gdb
+import _gdb.disassembler
+
+# Re-export everything from the _gdb.disassembler module, which is
+# defined within GDB's C++ code.
+from _gdb.disassembler import *
+
+# Module global dictionary of gdb.disassembler.Disassembler objects.
+# The keys of this dictionary are bfd architecture names, or the
+# special value None.
+#
+# When a request to disassemble comes in we first lookup the bfd
+# architecture name from the gdbarch, if that name exists in this
+# dictionary then we use that Disassembler object.
+#
+# If there's no architecture specific disassembler then we look for
+# the key None in this dictionary, and if that key exists, we use that
+# disassembler.
+#
+# If none of the above checks found a suitable disassembler, then no
+# disassembly is performed in Python.
+_disassemblers_dict = {}
+
+
+class Disassembler(object):
+    """A base class from which all user implemented disassemblers must
+    inherit."""
+
+    def __init__(self, name):
+        """Constructor.  Takes a name, which should be a string, which can be
+        used to identify this disassembler in diagnostic messages."""
+        self.name = name
+
+    def __call__(self, info):
+        """A default implementation of __call__.  All sub-classes must
+        override this method.  Calling this default implementation will throw
+        a NotImplementedError exception."""
+        raise NotImplementedError("Disassembler.__call__")
+
+
+def register_disassembler(disassembler, architecture=None):
+    """Register a disassembler.  DISASSEMBLER is a sub-class of
+    gdb.disassembler.Disassembler.  ARCHITECTURE is either None or a
+    string, the name of an architecture known to GDB.
+
+    DISASSEMBLER is registered as a disassembler for ARCHITECTURE, or
+    all architectures when ARCHITECTURE is None.
+
+    Returns the previous disassembler registered with this
+    ARCHITECTURE value.
+    """
+
+    if not isinstance(disassembler, Disassembler) and disassembler is not None:
+        raise TypeError("disassembler should sub-class gdb.disassembler.Disassembler")
+
+    old = None
+    if architecture in _disassemblers_dict:
+        old = _disassemblers_dict[architecture]
+        del _disassemblers_dict[architecture]
+    if disassembler is not None:
+        _disassemblers_dict[architecture] = disassembler
+
+    # Call the private _set_enabled function within the
+    # _gdb.disassembler module.  This function sets a global flag
+    # within GDB's C++ code that enables or dissables the Python
+    # disassembler functionality, this improves performance of the
+    # disassembler by avoiding unneeded calls into Python when we know
+    # that no disassemblers are registered.
+    _gdb.disassembler._set_enabled(len(_disassemblers_dict) > 0)
+    return old
+
+
+def _print_insn(info):
+    """This function is called by GDB when it wants to disassemble an
+    instruction.  INFO describes the instruction to be
+    disassembled."""
+
+    def lookup_disassembler(arch):
+        try:
+            name = arch.name()
+            if name is None:
+                return None
+            if name in _disassemblers_dict:
+                return _disassemblers_dict[name]
+            if None in _disassemblers_dict:
+                return _disassemblers_dict[None]
+            return None
+        except:
+            # It's pretty unlikely this exception case will ever
+            # trigger, one situation would be if the user somehow
+            # corrupted the _disassemblers_dict variable such that it
+            # was no longer a dictionary.
+            return None
+
+    disassembler = lookup_disassembler(info.architecture)
+    if disassembler is None:
+        return None
+    return disassembler(info)
+
+
+class maint_info_py_disassemblers_cmd(gdb.Command):
+    """
+    List all registered Python disassemblers.
+
+    List the name of all registered Python disassemblers, next to the
+    name of the architecture for which the disassembler is registered.
+
+    The global Python disassembler is listed next to the string
+    'GLOBAL'.
+
+    The disassembler that matches the architecture of the currently
+    selected inferior will be marked, this is an indication of which
+    disassembler will be invoked if any disassembly is performed in
+    the current inferior.
+    """
+
+    def __init__(self):
+        super().__init__("maintenance info python-disassemblers", gdb.COMMAND_USER)
+
+    def invoke(self, args, from_tty):
+        # If no disassemblers are registered, tell the user.
+        if len(_disassemblers_dict) == 0:
+            print("No Python disassemblers registered.")
+            return
+
+        # Figure out the longest architecture name, so we can
+        # correctly format the table of results.
+        longest_arch_name = 0
+        for architecture in _disassemblers_dict:
+            if architecture is not None:
+                name = _disassemblers_dict[architecture].name
+                if len(name) > longest_arch_name:
+                    longest_arch_name = len(name)
+
+        # Figure out the name of the current architecture.  There
+        # should always be a current inferior, but if, somehow, there
+        # isn't, then leave curr_arch as the empty string, which will
+        # not then match agaisnt any architecture in the dictionary.
+        curr_arch = ""
+        if gdb.selected_inferior() is not None:
+            curr_arch = gdb.selected_inferior().architecture().name()
+
+        # Now print the dictionary of registered disassemblers out to
+        # the user.
+        match_tag = "\t(Matches current architecture)"
+        fmt_len = max(longest_arch_name, len("Architecture"))
+        format_string = "{:" + str(fmt_len) + "s} {:s}"
+        print(format_string.format("Architecture", "Disassember Name"))
+        for architecture in _disassemblers_dict:
+            if architecture is not None:
+                name = _disassemblers_dict[architecture].name
+                if architecture == curr_arch:
+                    name += match_tag
+                    match_tag = ""
+                print(format_string.format(architecture, name))
+        if None in _disassemblers_dict:
+            name = _disassemblers_dict[None].name + match_tag
+            print(format_string.format("GLOBAL", name))
+
+
+maint_info_py_disassemblers_cmd()
diff --git a/gdb/python/py-disasm.c b/gdb/python/py-disasm.c
new file mode 100644
index 00000000000..c67b2e97664
--- /dev/null
+++ b/gdb/python/py-disasm.c
@@ -0,0 +1,1057 @@
+/* Python interface to instruction disassembly.
+
+   Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "defs.h"
+#include "python-internal.h"
+#include "dis-asm.h"
+#include "arch-utils.h"
+#include "charset.h"
+#include "disasm.h"
+#include "progspace.h"
+
+/* Implement gdb.disassembler.DisassembleInfo type.  An object of this type
+   represents a single disassembler request from GDB.  */
+
+struct disasm_info_object
+{
+  PyObject_HEAD
+
+  /* The architecture in which we are disassembling.  */
+  struct gdbarch *gdbarch;
+
+  /* The program_space in which we are disassembling.  */
+  struct program_space *program_space;
+
+  /* Address of the instruction to disassemble.  */
+  bfd_vma address;
+
+  /* The disassemble_info passed from core GDB, this contains the
+     callbacks necessary to read the instruction from core GDB, and to
+     print the disassembled instruction.  */
+  disassemble_info *gdb_info;
+
+  /* If copies of this object are created then they are chained together
+     via this NEXT pointer, this allows all the copies to be invalidated at
+     the same time as the parent object.  */
+  struct disasm_info_object *next;
+};
+
+extern PyTypeObject disasm_info_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_info_object");
+
+/* Implement gdb.disassembler.DisassemblerResult type, an object that holds
+   the result of calling the disassembler.  This is mostly the length of
+   the disassembled instruction (in bytes), and the string representing the
+   disassembled instruction.  */
+
+struct disasm_result_object
+{
+  PyObject_HEAD
+
+  /* The length of the disassembled instruction in bytes.  */
+  int length;
+
+  /* A buffer which, when allocated, holds the disassembled content of an
+     instruction.  */
+  string_file *content;
+};
+
+extern PyTypeObject disasm_result_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_result_object");
+
+/* When this is false we fast path out of gdbpy_print_insn, which should
+   keep the performance impact of the Python disassembler down.  This is
+   set to true from Python by calling gdb.disassembler._set_enabled() when
+   the user registers a disassembler.  */
+
+static bool python_print_insn_enabled = false;
+
+/* A sub-class of gdb_disassembler that holds a pointer to a Python
+   DisassembleInfo object.  A pointer to an instance of this class is
+   placed in the application_data field of the disassemble_info that is
+   used when we call gdbarch_print_insn.  */
+
+struct gdbpy_disassembler : public gdb_printing_disassembler
+{
+  /* Constructor.  */
+  gdbpy_disassembler (disasm_info_object *obj, PyObject *memory_source);
+
+  /* Get the DisassembleInfo object pointer.  */
+  disasm_info_object *
+  py_disasm_info () const
+  {
+    return m_disasm_info_object;
+  }
+
+  /* Callbacks used by disassemble_info.  */
+  static void memory_error_func (int status, bfd_vma memaddr,
+				 struct disassemble_info *info);
+  static void print_address_func (bfd_vma addr,
+				  struct disassemble_info *info);
+  static int read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+			       unsigned int len,
+			       struct disassemble_info *info);
+
+  /* Return a reference to an optional that contains the address at which a
+     memory error occurred.  The optional will only have a value if a
+     memory error actually occurred.  */
+  const gdb::optional<CORE_ADDR> &memory_error_address () const
+  { return m_memory_error_address; }
+
+  /* Return the content of the disassembler as a string.  The contents are
+     moved out of the disassembler, so after this call the disassembler
+     contents have been reset back to empty.  */
+  std::string release ()
+  {
+    return m_string_file.release ();
+  }
+
+private:
+
+  /* Where the disassembler result is written.  */
+  string_file m_string_file;
+
+  /* The DisassembleInfo object we are disassembling for.  */
+  disasm_info_object *m_disasm_info_object;
+
+  /* When the user indicates that a memory error has occurred then the
+     address of the memory error is stored in here.  */
+  gdb::optional<CORE_ADDR> m_memory_error_address;
+
+  /* When the user calls the builtin_disassemble function, if they pass a
+     memory source object then a pointer to the object is placed in here,
+     otherwise, this field is nullptr.  */
+  PyObject *m_memory_source;
+};
+
+/* Return true if OBJ is still valid, otherwise, return false.  A valid OBJ
+   will have a non-nullptr gdb_info field.  */
+
+static bool
+disasm_info_object_is_valid (disasm_info_object *obj)
+{
+  return obj->gdb_info != nullptr;
+}
+
+/* Fill in OBJ with all the other arguments.  */
+
+static void
+disasm_info_fill (disasm_info_object *obj, struct gdbarch *gdbarch,
+		  program_space *progspace, bfd_vma address,
+		  disassemble_info *di, disasm_info_object *next)
+{
+  obj->gdbarch = gdbarch;
+  obj->program_space = progspace;
+  obj->address = address;
+  obj->gdb_info = di;
+  obj->next = next;
+}
+
+/* Implement DisassembleInfo.__init__.  Takes a single argument that must
+   be another DisassembleInfo object and copies the contents from the
+   argument into this new object.  */
+
+static int
+disasm_info_init (PyObject *self, PyObject *args, PyObject *kwargs)
+{
+  static const char *keywords[] = { "info", NULL };
+  PyObject *info_obj;
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kwargs, "O!", keywords,
+					&disasm_info_object_type,
+					&info_obj))
+    return -1;
+
+  disasm_info_object *other = (disasm_info_object *) info_obj;
+  disasm_info_object *info = (disasm_info_object *) self;
+  disasm_info_fill (info, other->gdbarch, other->program_space,
+		    other->address, other->gdb_info, other->next);
+  other->next = info;
+
+  /* As the OTHER object now holds a pointer to INFO we inc the ref count
+     on INFO.  This stops INFO being deleted until OTHER has gone away.  */
+  Py_INCREF ((PyObject *) info);
+  return 0;
+}
+
+/* The tp_dealloc callback for the DisassembleInfo type.  */
+
+static void
+disasm_info_dealloc (PyObject *self)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+
+  /* We no longer care about the object our NEXT pointer points at, so we
+     can decrement its reference count.  This macro handles the case when
+     NEXT is nullptr.  */
+  Py_XDECREF ((PyObject *) obj->next);
+
+  /* Now core deallocation behaviour.  */
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* Implement DisassembleInfo.is_valid(), really just a wrapper around the
+   disasm_info_object_is_valid function above.  */
+
+static PyObject *
+disasmpy_info_is_valid (PyObject *self, PyObject *args)
+{
+  disasm_info_object *disasm_obj = (disasm_info_object *) self;
+
+  if (disasm_info_object_is_valid (disasm_obj))
+    Py_RETURN_TRUE;
+
+  Py_RETURN_FALSE;
+}
+
+/* Set the Python exception to be a gdb.MemoryError object, with ADDRESS
+   as its payload.  */
+
+static void
+disasmpy_set_memory_error_for_address (CORE_ADDR address)
+{
+  PyObject *address_obj = gdb_py_object_from_longest (address).release ();
+  PyErr_SetObject (gdbpy_gdb_memory_error, address_obj);
+}
+
+/* Ensure that a gdb.disassembler.DisassembleInfo is valid.  */
+
+#define DISASMPY_DISASM_INFO_REQUIRE_VALID(Info)			\
+  do {									\
+    if (!disasm_info_object_is_valid (Info))				\
+      {									\
+	PyErr_SetString (PyExc_RuntimeError,				\
+			 _("DisassembleInfo is no longer valid."));	\
+	return nullptr;							\
+      }									\
+  } while (0)
+
+/* Initialise OBJ, a DisassemblerResult object with LENGTH and CONTENT.
+   OBJ might already have been initialised, in which case any existing
+   content should be discarded before the new CONTENT is moved in.  */
+
+static void
+disasmpy_init_disassembler_result (disasm_result_object *obj, int length,
+				   std::string content)
+{
+  if (obj->content == nullptr)
+    obj->content = new string_file;
+  else
+    obj->content->clear ();
+
+  obj->length = length;
+  *(obj->content) = std::move (content);
+}
+
+/* Implement gdb.disassembler.builtin_disassemble().  Calls back into GDB's
+   builtin disassembler.  The first argument is a DisassembleInfo object
+   describing what to disassemble.  The second argument is optional and
+   provides a mechanism to modify the memory contents that the builtin
+   disassembler will actually disassemble.
+
+   Returns an instance of gdb.disassembler.DisassemblerResult, an object
+   that wraps a disassembled instruction, or it raises a
+   gdb.MemoryError.  */
+
+static PyObject *
+disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *info_obj, *memory_source_obj = nullptr;
+  static const char *keywords[] = { "info", "memory_source", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
+					&disasm_info_object_type, &info_obj,
+					&memory_source_obj))
+    return nullptr;
+
+  disasm_info_object *disasm_info = (disasm_info_object *) info_obj;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (disasm_info);
+
+  /* Where the result will be written.  */
+  gdbpy_disassembler disassembler (disasm_info, memory_source_obj);
+
+  /* Now actually perform the disassembly.  */
+  int length
+    = gdbarch_print_insn (disasm_info->gdbarch, disasm_info->address,
+			  disassembler.disasm_info ());
+
+  if (length == -1)
+    {
+
+      /* In an ideal world, every disassembler should always call the
+	 memory error function before returning a status of -1 as the only
+	 error a disassembler should encounter is a failure to read
+	 memory.  Unfortunately, there are some disassemblers who don't
+	 follow this rule, and will return -1 without calling the memory
+	 error function.
+
+	 To make the Python API simpler, we just classify everything as a
+	 memory error, but the message has to be modified for the case
+	 where the disassembler didn't call the memory error function.  */
+      if (disassembler.memory_error_address ().has_value ())
+	{
+	  CORE_ADDR addr = *disassembler.memory_error_address ();
+	  disasmpy_set_memory_error_for_address (addr);
+	}
+      else
+	PyErr_SetString (gdbpy_gdb_memory_error, "Unknown disassembly error");
+      return nullptr;
+    }
+
+  /* Instructions are either non-zero in length, or we got an error,
+     indicated by a length of -1, which we handled above.  */
+  gdb_assert (length > 0);
+
+  /* We should not have seen a memory error in this case.  */
+  gdb_assert (!disassembler.memory_error_address ().has_value ());
+
+  /* Create a DisassemblerResult containing the results.  */
+  std::string content = disassembler.release ();
+  PyTypeObject *type = &disasm_result_object_type;
+  gdbpy_ref<disasm_result_object> res
+    ((disasm_result_object *) type->tp_alloc (type, 0));
+  disasmpy_init_disassembler_result (res.get (), length, std::move (content));
+  return reinterpret_cast<PyObject *> (res.release ());
+}
+
+/* Implement gdb._set_enabled function.  Takes a boolean parameter, and
+   sets whether GDB should enter the Python disassembler code or not.
+
+   This is called from within the Python code when a new disassembler is
+   registered.  When no disassemblers are registered the global C++ flag
+   is set to false, and GDB never even enters the Python environment to
+   check for a disassembler.
+
+   When the user registers a new Python disassembler, the global C++ flag
+   is set to true, and now GDB will enter the Python environment to check
+   if there's a disassembler registered for the current architecture.  */
+
+static PyObject *
+disasmpy_set_enabled (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *newstate;
+  static const char *keywords[] = { "state", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O", keywords,
+					&newstate))
+    return nullptr;
+
+  if (!PyBool_Check (newstate))
+    {
+      PyErr_SetString (PyExc_TypeError,
+		       _("The value passed to `_set_enabled' must be a boolean."));
+      return nullptr;
+    }
+
+  python_print_insn_enabled = PyObject_IsTrue (newstate);
+  Py_RETURN_NONE;
+}
+
+/* Implement DisassembleInfo.read_memory(LENGTH, OFFSET).  Read LENGTH
+   bytes at OFFSET from the start of the instruction currently being
+   disassembled, and return a memory buffer containing the bytes.
+
+   OFFSET defaults to zero if it is not provided.  LENGTH is required.  If
+   the read fails then this will raise a gdb.MemoryError exception.  */
+
+static PyObject *
+disasmpy_info_read_memory (PyObject *self, PyObject *args, PyObject *kw)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+
+  LONGEST length, offset = 0;
+  gdb::unique_xmalloc_ptr<gdb_byte> buffer;
+  static const char *keywords[] = { "length", "offset", nullptr };
+
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "L|L", keywords,
+					&length, &offset))
+    return nullptr;
+
+  /* The apparent address from which we are reading memory.  Note that in
+     some cases GDB actually disassembles instructions from a buffer, so
+     we might not actually be reading this information directly from the
+     inferior memory.  This is all hidden behind the read_memory_func API
+     within the disassemble_info structure.  */
+  CORE_ADDR address = obj->address + offset;
+
+  /* Setup a buffer to hold the result.  */
+  buffer.reset ((gdb_byte *) xmalloc (length));
+
+  /* Read content into BUFFER.  If the read fails then raise a memory
+     error, otherwise, convert BUFFER to a Python memory buffer, and return
+     it to the user.  */
+  disassemble_info *info = obj->gdb_info;
+  if (info->read_memory_func ((bfd_vma) address, buffer.get (),
+			      (unsigned int) length, info) != 0)
+    {
+      disasmpy_set_memory_error_for_address (address);
+      return nullptr;
+    }
+  return gdbpy_buffer_to_membuf (std::move (buffer), address, length);
+}
+
+/* Implement DisassembleInfo.address attribute, return the address at which
+   GDB would like an instruction disassembled.  */
+
+static PyObject *
+disasmpy_info_address (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdb_py_object_from_longest (obj->address).release ();
+}
+
+/* Implement DisassembleInfo.architecture attribute.  Return the
+   gdb.Architecture in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_architecture (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdbarch_to_arch_object (obj->gdbarch);
+}
+
+/* Implement DisassembleInfo.progspace attribute.  Return the
+   gdb.Progspace in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_progspace (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return pspace_to_pspace_object (obj->program_space).release ();
+}
+
+/* This implements the disassemble_info read_memory_func callback and is
+   called from the libopcodes disassembler when the disassembler wants to
+   read memory.
+
+   From the INFO argument we can find the gdbpy_disassembler object for
+   which we are disassembling, and from that object we can find the
+   DisassembleInfo for the current disassembly call.
+
+   This function reads the instruction bytes by calling the read_memory
+   method on the DisassembleInfo object.  This method might have been
+   overridden by user code.
+
+   Read LEN bytes from MEMADDR and place them into BUFF.  Return 0 on
+   success (in which case BUFF has been filled), or -1 on error, in which
+   case the contents of BUFF are undefined.  */
+
+int
+gdbpy_disassembler::read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+				      unsigned int len,
+				      struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  disasm_info_object *obj = dis->py_disasm_info ();
+
+  /* The DisassembleInfo.read_memory method expects an offset from the
+     address stored within the DisassembleInfo object; calculate that
+     offset here.  */
+  LONGEST offset = (LONGEST) memaddr - (LONGEST) obj->address;
+
+  /* Now call the DisassembleInfo.read_memory method.  This might have been
+     overridden by the user.  */
+  gdbpy_ref<> result_obj (PyObject_CallMethod ((PyObject *) obj,
+					       "read_memory",
+					       "KL", len, offset));
+
+  /* Handle any exceptions.  */
+  if (result_obj == nullptr)
+    {
+      /* If we got a gdb.MemoryError then we ignore this and just report
+	 that the read failed to the caller.  The caller is then
+	 responsible for calling the memory_error_func if it wants to.
+	 Remember, the disassembler might just be probing to see if these
+	 bytes can be read, if we automatically call the memory error
+	 function, we can end up registering an error prematurely.  */
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	PyErr_Clear ();
+      else
+	gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Convert the result to a buffer.  */
+  Py_buffer py_buff;
+  if (!PyObject_CheckBuffer (result_obj.get ())
+      || PyObject_GetBuffer (result_obj.get(), &py_buff, PyBUF_CONTIG_RO) < 0)
+    {
+      PyErr_Format (PyExc_TypeError,
+		    _("Result from read_memory is not a buffer"));
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Wrap PY_BUFF so that it is cleaned up correctly at the end of this
+     scope.  */
+  Py_buffer_up buffer_up (&py_buff);
+
+  /* Validate that the buffer is the correct length.  */
+  if (py_buff.len != len)
+    {
+      PyErr_Format (PyExc_ValueError,
+		    _("Buffer returned from read_memory is sized %d instead of the expected %d"),
+		    py_buff.len, len);
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Copy the data out of the Python buffer and return success.  */
+  const gdb_byte *buffer = (const gdb_byte *) py_buff.buf;
+  memcpy (buff, buffer, len);
+  return 0;
+}
+
+/* Implement DisassemblerResult.length attribute, return the length of the
+   disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_length (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  return gdb_py_object_from_longest (obj->length).release ();
+}
+
+/* Implement DisassemblerResult.string attribute, return the content string
+   of the disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_string (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+
+  gdb_assert (obj->content != nullptr);
+  gdb_assert (strlen (obj->content->c_str ()) > 0);
+  gdb_assert (obj->length > 0);
+  return PyUnicode_Decode (obj->content->c_str (),
+			   obj->content->size (),
+			   host_charset (), nullptr);
+}
+
+/* Implement DisassemblerResult.__init__.  Takes two arguments, an
+   integer, the length in bytes of the disassembled instruction, and a
+   string, the disassembled content of the instruction.  */
+
+static int
+disasmpy_result_init (PyObject *self, PyObject *args, PyObject *kwargs)
+{
+  static const char *keywords[] = { "length", "string", NULL };
+  int length;
+  const char *string;
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kwargs, "is", keywords,
+					&length, &string))
+    return -1;
+
+  if (length <= 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("Length must be greater than 0."));
+      return -1;
+    }
+
+  if (strlen (string) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("String must not be empty."));
+      return -1;
+    }
+
+  disasm_result_object *obj = (disasm_result_object *) self;
+  disasmpy_init_disassembler_result (obj, length, std::string (string));
+
+  return 0;
+}
+
+/* Implement memory_error_func callback for disassemble_info.  Extract the
+   underlying DisassembleInfo Python object, and set a memory error on
+   it.  */
+
+void
+gdbpy_disassembler::memory_error_func (int status, bfd_vma memaddr,
+				       struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  dis->m_memory_error_address.emplace (memaddr);
+}
+
+/* Wrapper of print_address.  */
+
+void
+gdbpy_disassembler::print_address_func (bfd_vma addr,
+					struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  print_address (dis->arch (), addr, (struct ui_file *) info->stream);
+}
+
+/* constructor.  */
+
+gdbpy_disassembler::gdbpy_disassembler (disasm_info_object *obj,
+					PyObject *memory_source)
+  : gdb_printing_disassembler (obj->gdbarch, &m_string_file,
+			       read_memory_func, memory_error_func,
+			       print_address_func),
+    m_disasm_info_object (obj),
+    m_memory_source (memory_source)
+{ /* Nothing.  */ }
+
+/* A wrapper around a reference to a Python DisassembleInfo object, which
+   ensures that the object is marked as invalid when we leave the enclosing
+   scope.
+
+   Each DisassembleInfo is created in gdbpy_print_insn, and is done with by
+   the time that function returns.  However, there's nothing to stop a user
+   caching a reference to the DisassembleInfo, and thus keeping the object
+   around.
+
+   We therefore have the notion of a DisassembleInfo becoming invalid, this
+   happens when gdbpy_print_insn returns.  This class is responsible for
+   marking the DisassembleInfo as invalid in its destructor.  */
+
+struct scoped_disasm_info_object
+{
+  /* Constructor.  */
+  scoped_disasm_info_object (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+			     disassemble_info *info)
+    : m_disasm_info (allocate_disasm_info_object ())
+  {
+    disasm_info_fill (m_disasm_info.get (), gdbarch, current_program_space,
+		      memaddr, info, nullptr);
+  }
+
+  /* Upon destruction mark m_diasm_info as invalid.  */
+  ~scoped_disasm_info_object ()
+  {
+    /* Invalidate the original DisassembleInfo object as well as any copies
+       that the user might have made.  */
+    for (disasm_info_object *obj = m_disasm_info.get ();
+	 obj != nullptr;
+	 obj = obj->next)
+      obj->gdb_info = nullptr;
+  }
+
+  /* Return a pointer to the underlying disasm_info_object instance.  */
+  disasm_info_object *
+  get () const
+  {
+    return m_disasm_info.get ();
+  }
+
+private:
+
+  /* Wrapper around the call to PyObject_New, this wrapper function can be
+     called from the constructor initialization list, while PyObject_New, a
+     macro, can't.  */
+  static disasm_info_object *
+  allocate_disasm_info_object ()
+  {
+    return (disasm_info_object *) PyObject_New (disasm_info_object,
+						&disasm_info_object_type);
+  }
+
+  /* A reference to a gdb.disassembler.DisassembleInfo object.  When this
+     containing instance goes out of scope this reference is released,
+     however, the user might be holding other references to the
+     DisassembleInfo object in Python code, so the underlying object might
+     not be deleted.  */
+  gdbpy_ref<disasm_info_object> m_disasm_info;
+};
+
+/* See python-internal.h.  */
+
+gdb::optional<int>
+gdbpy_print_insn (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+		  disassemble_info *info)
+{
+  /* Early exit case.  This must be done as early as possible, and
+     definitely before we enter Python environment.  The
+     python_print_insn_enabled flag is set (from Python) only when the user
+     has installed one (or more) Python disassemblers.  So in the common
+     case (no custom disassembler installed) this flag will be false,
+     allowing for a quick return.  */
+  if (!gdb_python_initialized || !python_print_insn_enabled)
+    return {};
+
+  gdbpy_enter enter_py (get_current_arch (), current_language);
+
+  /* Import the gdb.disassembler module.  */
+  gdbpy_ref<> gdb_python_disassembler_module
+    (PyImport_ImportModule ("gdb.disassembler"));
+  if (gdb_python_disassembler_module == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Get the _print_insn attribute from the module, this should be the
+     function we are going to call to actually perform the disassembly.  */
+  gdbpy_ref<> hook
+    (PyObject_GetAttrString (gdb_python_disassembler_module.get (),
+			     "_print_insn"));
+  if (hook == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Create the new DisassembleInfo object we will pass into Python.  This
+     object will be marked as invalid when we leave this scope.  */
+  scoped_disasm_info_object scoped_disasm_info (gdbarch, memaddr, info);
+  disasm_info_object *disasm_info = scoped_disasm_info.get ();
+
+  /* Call into the registered disassembler to (possibly) perform the
+     disassembly.  */
+  PyObject *insn_disas_obj = (PyObject *) disasm_info;
+  gdbpy_ref<> result (PyObject_CallFunctionObjArgs (hook.get (),
+						    insn_disas_obj,
+						    nullptr));
+
+  if (result == nullptr)
+    {
+      /* The call into Python code resulted in an exception.  If this was a
+	 gdb.MemoryError, then we can figure out an address and call the
+	 disassemble_info::memory_error_func to report the error back to
+	 core GDB.  Any other exception type we assume means a bug in the
+	 user's code, and print stack.  */
+
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	{
+	  /* A gdb.MemoryError might have an address attribute which
+	     contains the address at which the memory error occurred.  If
+	     this is the case then use this address, otherwise, fallback to
+	     just using the address of the instruction we were asked to
+	     disassemble.  */
+	  PyObject *error_type, *error_value, *error_traceback;
+	  CORE_ADDR addr;
+
+	  PyErr_Fetch (&error_type, &error_value, &error_traceback);
+
+	  if (error_value != nullptr
+	      && PyObject_HasAttrString (error_value, "address"))
+	    {
+	      PyObject *addr_obj = PyObject_GetAttrString (error_value,
+							   "address");
+	      if (get_addr_from_python (addr_obj, &addr) < 0)
+		addr = disasm_info->address;
+	    }
+	  else
+	    addr = disasm_info->address;
+
+	  PyErr_Clear ();
+	  info->memory_error_func (-1, addr, info);
+	  return gdb::optional<int> (-1);
+	}
+      else
+	{
+	  /* Anything that is not gdb.MemoryError.  */
+	  gdbpy_print_stack ();
+	  return {};
+	}
+    }
+  else if (result == Py_None)
+    {
+      /* A return value of None indicates that the Python code could not,
+	 or doesn't want to, disassemble this instruction.  Just return an
+	 empty result and core GDB will try to disassemble this for us.  */
+      return {};
+    }
+
+  /* Check the result is a DisassemblerResult (or a sub-class).  */
+  if (!PyObject_IsInstance (result.get (),
+			    (PyObject *) &disasm_result_object_type))
+    {
+      PyErr_SetString (PyExc_TypeError,
+		       _("Result is not a DisassemblerResult."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* The call into Python neither raised an exception, or returned None.
+     Check to see if the result looks valid.  */
+  gdbpy_ref<> length_obj (PyObject_GetAttrString (result.get (), "length"));
+  if (length_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  gdbpy_ref<> string_obj (PyObject_GetAttrString (result.get (), "string"));
+  if (string_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+  if (!gdbpy_is_string (string_obj.get ()))
+    {
+      PyErr_SetString (PyExc_TypeError, _("String attribute is not a string."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  gdb::unique_xmalloc_ptr<char> string
+    = gdbpy_obj_to_string (string_obj.get ());
+  if (string == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  long length;
+  if (!gdb_py_int_as_long (length_obj.get (), &length))
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  long max_insn_length = (gdbarch_max_insn_length_p (gdbarch) ?
+			  gdbarch_max_insn_length (gdbarch) : INT_MAX);
+  if (length <= 0)
+    {
+      PyErr_SetString
+	(PyExc_ValueError,
+	 _("Invalid length attribute: length must be greater than 0."));
+      gdbpy_print_stack ();
+      return {};
+    }
+  if (length > max_insn_length)
+    {
+      PyErr_Format
+	(PyExc_ValueError,
+	 _("Invalid length attribute: length %d greater than architecture maximum of %d"),
+	 length, max_insn_length);
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  if (strlen (string.get ()) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError, _("String attribute must not be empty."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Print the disassembled instruction back to core GDB, and return the
+     length of the disassembled instruction.  */
+  info->fprintf_func (info->stream, "%s", string.get ());
+  return gdb::optional<int> (length);
+}
+
+/* The tp_dealloc callback for the DisassemblerResult type.  Takes care of
+   deallocating the content buffer.  */
+
+static void
+disasmpy_dealloc_result (PyObject *self)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  delete obj->content;
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* The get/set attributes of the gdb.disassembler.DisassembleInfo type.  */
+
+static gdb_PyGetSetDef disasm_info_object_getset[] = {
+  { "address", disasmpy_info_address, nullptr,
+    "Start address of the instruction to disassemble.", nullptr },
+  { "architecture", disasmpy_info_architecture, nullptr,
+    "Architecture to disassemble in", nullptr },
+  { "progspace", disasmpy_info_progspace, nullptr,
+    "Program space to disassemble in", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* The methods of the gdb.disassembler.DisassembleInfo type.  */
+
+static PyMethodDef disasm_info_object_methods[] = {
+  { "read_memory", (PyCFunction) disasmpy_info_read_memory,
+    METH_VARARGS | METH_KEYWORDS,
+    "read_memory (LEN, OFFSET = 0) -> Octets[]\n\
+Read LEN octets for the instruction to disassemble." },
+  { "is_valid", disasmpy_info_is_valid, METH_NOARGS,
+    "is_valid () -> Boolean.\n\
+Return true if this DisassembleInfo is valid, false if not." },
+  {nullptr}  /* Sentinel */
+};
+
+/* The get/set attributes of the gdb.disassembler.DisassemblerResult type.  */
+
+static gdb_PyGetSetDef disasm_result_object_getset[] = {
+  { "length", disasmpy_result_length, nullptr,
+    "Length of the disassembled instruction.", nullptr },
+  { "string", disasmpy_result_string, nullptr,
+    "String representing the disassembled instruction.", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* These are the methods we add into the _gdb.disassembler module, which
+   are then imported into the gdb.disassembler module.  These are global
+   functions that support performing disassembly.  */
+
+PyMethodDef python_disassembler_methods[] =
+{
+  { "builtin_disassemble", (PyCFunction) disasmpy_builtin_disassemble,
+    METH_VARARGS | METH_KEYWORDS,
+    "builtin_disassemble (INFO, MEMORY_SOURCE = None) -> None\n\
+Disassemble using GDB's builtin disassembler.  INFO is an instance of\n\
+gdb.disassembler.DisassembleInfo.  The MEMORY_SOURCE, if not None, should\n\
+be an object with the read_memory method." },
+  { "_set_enabled", (PyCFunction) disasmpy_set_enabled,
+    METH_VARARGS | METH_KEYWORDS,
+    "_set_enabled (STATE) -> None\n\
+Set whether GDB should call into the Python _print_insn code or not." },
+  {nullptr, nullptr, 0, nullptr}
+};
+
+/* Structure to define the _gdb.disassembler module.  */
+
+static struct PyModuleDef python_disassembler_module_def =
+{
+  PyModuleDef_HEAD_INIT,
+  "_gdb.disassembler",
+  nullptr,
+  -1,
+  python_disassembler_methods,
+  nullptr,
+  nullptr,
+  nullptr,
+  nullptr
+};
+
+/* Called to initialize the Python structures in this file.  */
+
+int
+gdbpy_initialize_disasm ()
+{
+  /* Create the _gdb.disassembler module, and add it to the _gdb module.  */
+
+  PyObject *gdb_disassembler_module;
+  gdb_disassembler_module = PyModule_Create (&python_disassembler_module_def);
+  if (gdb_disassembler_module == nullptr)
+    return -1;
+  PyModule_AddObject(gdb_module, "disassembler", gdb_disassembler_module);
+
+  /* This is needed so that 'import _gdb.disassembler' will work.  */
+  PyObject *dict = PyImport_GetModuleDict ();
+  PyDict_SetItemString (dict, "_gdb.disassembler", gdb_disassembler_module);
+
+  disasm_info_object_type.tp_new = PyType_GenericNew;
+  if (PyType_Ready (&disasm_info_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassembleInfo",
+			      (PyObject *) &disasm_info_object_type) < 0)
+    return -1;
+
+  disasm_result_object_type.tp_new = PyType_GenericNew;
+  if (PyType_Ready (&disasm_result_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassemblerResult",
+			      (PyObject *) &disasm_result_object_type) < 0)
+    return -1;
+
+  return 0;
+}
+
+/* Describe the gdb.disassembler.DisassembleInfo type.  */
+
+PyTypeObject disasm_info_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassembleInfo",		/*tp_name*/
+  sizeof (disasm_info_object),			/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  disasm_info_dealloc,				/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,	/*tp_flags*/
+  "GDB instruction disassembler object",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  disasm_info_object_methods,			/* tp_methods */
+  0,						/* tp_members */
+  disasm_info_object_getset,			/* tp_getset */
+  0,						/* tp_base */
+  0,						/* tp_dict */
+  0,						/* tp_descr_get */
+  0,						/* tp_descr_set */
+  0,						/* tp_dictoffset */
+  disasm_info_init,				/* tp_init */
+  0,						/* tp_alloc */
+};
+
+/* Describe the gdb.disassembler.DisassemblerResult type.  */
+
+PyTypeObject disasm_result_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassemblerResult",	/*tp_name*/
+  sizeof (disasm_result_object),		/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  disasmpy_dealloc_result,			/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,	/*tp_flags*/
+  "GDB object, representing a disassembler result",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  0,						/* tp_methods */
+  0,						/* tp_members */
+  disasm_result_object_getset,			/* tp_getset */
+  0,						/* tp_base */
+  0,						/* tp_dict */
+  0,						/* tp_descr_get */
+  0,						/* tp_descr_set */
+  0,						/* tp_dictoffset */
+  disasmpy_result_init,				/* tp_init */
+  0,						/* tp_alloc */
+};
diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
index d947b96033b..ed5894c1c3d 100644
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@@ -540,6 +540,8 @@ int gdbpy_initialize_connection ()
 int gdbpy_initialize_micommands (void)
   CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 void gdbpy_finalize_micommands ();
+int gdbpy_initialize_disasm ()
+  CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 
 /* A wrapper for PyErr_Fetch that handles reference counting for the
    caller.  */
@@ -822,4 +824,18 @@ extern bool gdbpy_is_architecture (PyObject *obj);
 
 extern bool gdbpy_is_progspace (PyObject *obj);
 
+/* Implement the 'print_insn' hook for Python.  Disassemble an instruction
+   whose address is ADDRESS for architecture GDBARCH.  The bytes of the
+   instruction should be read with INFO->read_memory_func as the
+   instruction being disassembled might actually be in a buffer.
+
+   Used INFO->fprintf_func to print the results of the disassembly, and
+   return the length of the instruction in octets.
+
+   If no instruction can be disassembled then return an empty value.  */
+
+extern gdb::optional<int> gdbpy_print_insn (struct gdbarch *gdbarch,
+					    CORE_ADDR address,
+					    disassemble_info *info);
+
 #endif /* PYTHON_PYTHON_INTERNAL_H */
diff --git a/gdb/python/python.c b/gdb/python/python.c
index b5b8379e23c..084b3687fec 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -167,7 +167,7 @@ static const struct extension_language_ops python_extension_ops =
 
   gdbpy_colorize_disasm,
 
-  NULL, /* gdbpy_print_insn, */
+  gdbpy_print_insn,
 };
 
 #endif /* HAVE_PYTHON */
@@ -2045,6 +2045,7 @@ do_start_initialization ()
 
   if (gdbpy_initialize_auto_load () < 0
       || gdbpy_initialize_values () < 0
+      || gdbpy_initialize_disasm () < 0
       || gdbpy_initialize_frames () < 0
       || gdbpy_initialize_commands () < 0
       || gdbpy_initialize_instruction () < 0
diff --git a/gdb/testsuite/gdb.python/py-disasm.c b/gdb/testsuite/gdb.python/py-disasm.c
new file mode 100644
index 00000000000..ee0bb157f4d
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.c
@@ -0,0 +1,25 @@
+/* This test program is part of GDB, the GNU debugger.
+
+   Copyright 2021-2022 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+int
+main ()
+{
+  asm ("nop");
+  asm ("nop");	/* Break here.  */
+  asm ("nop");
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.exp b/gdb/testsuite/gdb.python/py-disasm.exp
new file mode 100644
index 00000000000..432a1c61d02
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.exp
@@ -0,0 +1,202 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This file is part of the GDB testsuite.  It validates the Python
+# disassembler API.
+
+load_lib gdb-python.exp
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile} "debug"] } {
+    return -1
+}
+
+# Skip all tests if Python scripting is not enabled.
+if { [skip_python_tests] } { continue }
+
+if ![runto_main] then {
+    fail "can't run to main"
+    return 0
+}
+
+set pyfile [gdb_remote_download host ${srcdir}/${subdir}/${testfile}.py]
+
+gdb_test "source ${pyfile}" "Python script imported" \
+         "import python scripts"
+
+gdb_breakpoint [gdb_get_line_number "Break here."]
+gdb_continue_to_breakpoint "Break here."
+
+set curr_pc [get_valueof "/x" "\$pc" "*unknown*"]
+
+gdb_test_no_output "python current_pc = ${curr_pc}"
+
+# The current pc will be something like 0x1234 with no leading zeros.
+# However, in the disassembler output addresses are padded with zeros.
+# This substitution changes 0x1234 to 0x0*1234, which can then be used
+# as a regexp in the disassembler output matching.
+set curr_pc_pattern [string replace ${curr_pc} 0 1 "0x0*"]
+
+# Grab the name of the current architecture, this is used in the tests
+# patterns below.
+set curr_arch [get_python_valueof "gdb.selected_inferior().architecture().name()" "*unknown*"]
+
+# Helper proc that removes all registered disassemblers.
+proc py_remove_all_disassemblers {} {
+    gdb_test_no_output "python remove_all_python_disassemblers()"
+}
+
+# A list of test plans.  Each plan is a list of two elements, the
+# first element is the name of a class in py-disasm.py, this is a
+# disassembler class.  The second element is a pattern that should be
+# matched in the disassembler output.
+#
+# Each different disassembler tests some different feature of the
+# Python disassembler API.
+set addr_pattern "\r\n=> ${curr_pc_pattern} <\[^>\]+>:\\s+"
+set base_pattern "${addr_pattern}nop"
+set test_plans \
+    [list \
+	 [list "" "${base_pattern}\r\n.*"] \
+	 [list "GlobalNullDisassembler" "${base_pattern}\r\n.*"] \
+	 [list "GlobalPreInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalPostInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalReadDisassembler" "${base_pattern}\\s+## bytes =( $hex)+\r\n.*"] \
+	 [list "GlobalAddrDisassembler" "${base_pattern}\\s+## addr = ${curr_pc_pattern} <\[^>\]+>\r\n.*"] \
+	 [list "NonMemoryErrorEarlyDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: non-memory error instead of a result\r\nnop\r\n.*"] \
+	 [list "NonMemoryErrorLateDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: non-memory error after builtin disassembler\r\nnop\r\n.*"] \
+	 [list "MemoryErrorEarlyDisassembler" "${base_pattern}\\s+## AFTER ERROR\r\n.*"] \
+	 [list "MemoryErrorLateDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "RethrowMemoryErrorDisassembler" "${addr_pattern}Cannot access memory at address $hex"] \
+	 [list "FaultingMemorySourceDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "FailingMemorySourceDisassembler" "${addr_pattern}Python Exception <class 'gdb.GdbError'>: the memory source failed\r\n\r\nCannot access memory at address ${curr_pc_pattern}"] \
+	 [list "MemorySourceNotABufferDisassembler" "${addr_pattern}Python Exception <class 'TypeError'>: Result from read_memory is not a buffer\r\n\r\nCannot access memory at address ${curr_pc_pattern}"] \
+	 [list "MemorySourceBufferTooLongDisassembler" "${addr_pattern}Python Exception <class 'ValueError'>: Buffer returned from read_memory is sized $decimal instead of the expected $decimal\r\n\r\nCannot access memory at address ${curr_pc_pattern}"] \
+	 [list "ResultOfWrongType" "${addr_pattern}Python Exception <class 'TypeError'>: Result is not a DisassemblerResult.\r\n.*"] \
+	 [list "ResultWithInvalidLength" "${addr_pattern}Python Exception <class 'ValueError'>: Invalid length attribute: length must be greater than 0.\r\n.*"] \
+	 [list "ResultWithInvalidString" "${addr_pattern}Python Exception <class 'ValueError'>: String attribute must not be empty.\r\n.*"]]
+
+# Now execute each test plan.
+foreach plan $test_plans {
+    set global_disassembler_name [lindex $plan 0]
+    set expected_pattern [lindex $plan 1]
+
+    with_test_prefix "global_disassembler=${global_disassembler_name}" {
+	# Remove all existing disassemblers.
+	py_remove_all_disassemblers
+
+	# If we have a disassembler to load, do it now.
+	if { $global_disassembler_name != "" } {
+	    gdb_test_no_output "python add_global_disassembler($global_disassembler_name)"
+	}
+
+	# Disassemble main, and check the disassembler output.
+	gdb_test "disassemble main" $expected_pattern
+    }
+}
+
+# Check some errors relating to DisassemblerResult creation.
+with_test_prefix "DisassemblerResult errors" {
+    gdb_test "python gdb.disassembler.DisassemblerResult(0, 'abc')" \
+	[multi_line \
+	     "ValueError: Length must be greater than 0." \
+	     "Error while executing Python code."]
+    gdb_test "python gdb.disassembler.DisassemblerResult(-1, 'abc')" \
+	[multi_line \
+	     "ValueError: Length must be greater than 0." \
+	     "Error while executing Python code."]
+    gdb_test "python gdb.disassembler.DisassemblerResult(1, '')" \
+	[multi_line \
+	     "ValueError: String must not be empty." \
+	     "Error while executing Python code."]
+}
+
+# Check that the architecture specific disassemblers can override the
+# global disassembler.
+#
+# First, register a global disassembler, and check it is in place.
+with_test_prefix "GLOBAL tagging disassembler" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"GLOBAL\"), None)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Now register an architecture specific disassembler, and check it
+# overrides the global disassembler.
+with_test_prefix "LOCAL tagging disassembler" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"LOCAL\"), \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = LOCAL\r\n.*"
+}
+
+# Now remove the architecture specific disassembler, and check that
+# the global disassembler kicks back in.
+with_test_prefix "GLOBAL tagging disassembler again" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(None, \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Check that a DisassembleInfo becomes invalid after the call into the
+# disassembler.
+with_test_prefix "DisassembleInfo becomes invalid" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python add_global_disassembler(GlobalCachingDisassembler)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## CACHED\r\n.*"
+    gdb_test "python GlobalCachingDisassembler.check()" "PASS"
+}
+
+# Test the memory source aspect of the builtin disassembler.
+with_test_prefix "memory source api" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python analyzing_disassembler = add_global_disassembler(AnalyzingDisassembler)"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*"
+    gdb_test "python analyzing_disassembler.find_replacement_candidate()" \
+	"Replace from $hex to $hex with NOP"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*" \
+	"second disassembler pass"
+    gdb_test "python analyzing_disassembler.check()" \
+	"PASS"
+}
+
+# Test the 'maint info python-disassemblers command.
+with_test_prefix "maint info python-disassemblers" {
+    py_remove_all_disassemblers
+    gdb_test "maint info python-disassemblers" "No Python disassemblers registered\\." \
+	"list disassemblers, none registered"
+    gdb_test_no_output "python disasm = add_global_disassembler(BuiltinDisassembler)"
+    gdb_test "maint info python-disassemblers" \
+	[multi_line \
+	     "Architecture\\s+Disassember Name" \
+	     "GLOBAL\\s+BuiltinDisassembler\\s+\\(Matches current architecture\\)"] \
+	"list disassemblers, single global disassembler"
+    gdb_test_no_output "python arch = gdb.selected_inferior().architecture().name()"
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(disasm, arch)"
+    gdb_test "maint info python-disassemblers" \
+	[multi_line \
+	     "Architecture\\s+Disassember Name" \
+	     "\[^\r\n\]+BuiltinDisassembler\\s+\\(Matches current architecture\\)" \
+	     "GLOBAL\\s+BuiltinDisassembler"] \
+	"list disassemblers, multiple disassemblers registered"
+}
+
+# Check the attempt to create a "new" DisassembleInfo object fails.
+with_test_prefix "Bad DisassembleInfo creation" {
+    gdb_test_no_output "python my_info = InvalidDisassembleInfo()"
+    gdb_test "python print(my_info.is_valid())" "True"
+    gdb_test "python gdb.disassembler.builtin_disassemble(my_info)" \
+	[multi_line \
+	     "RuntimeError: DisassembleInfo is no longer valid\\." \
+	     "Error while executing Python code\\."]
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.py b/gdb/testsuite/gdb.python/py-disasm.py
new file mode 100644
index 00000000000..62925ce8c06
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.py
@@ -0,0 +1,614 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import gdb
+import gdb.disassembler
+import struct
+import sys
+
+from gdb.disassembler import Disassembler, DisassemblerResult
+
+# A global, holds the program-counter address at which we should
+# perform the extra disassembly that this script provides.
+current_pc = None
+
+
+# Remove all currently registered disassemblers.
+def remove_all_python_disassemblers():
+    for a in gdb.architecture_names():
+        gdb.disassembler.register_disassembler(None, a)
+    gdb.disassembler.register_disassembler(None, None)
+
+
+class TestDisassembler(Disassembler):
+    """A base class for disassemblers within this script to inherit from.
+    Implements the __call__ method and ensures we only do any
+    disassembly wrapping for the global CURRENT_PC."""
+
+    def __init__(self):
+        global current_pc
+
+        super().__init__("TestDisassembler")
+        self.__info = None
+        if current_pc == None:
+            raise gdb.GdbError("no current_pc set")
+
+    def __call__(self, info):
+        global current_pc
+
+        if info.address != current_pc:
+            return None
+        self.__info = info
+        return self.disassemble(info)
+
+    def get_info(self):
+        return self.__info
+
+    def disassemble(self, info):
+        raise NotImplementedError("override the disassemble method")
+
+
+class GlobalPreInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo before disassembly has occurred."""
+
+    def disassemble(self, info):
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalPostInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo after disassembly has occurred."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalReadDisassembler(TestDisassembler):
+    """Check the DisassembleInfo.read_memory method.  Calls the builtin
+    disassembler, then reads all of the bytes of this instruction, and
+    adds them as a comment to the disassembler output."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        len = result.length
+        str = ""
+        for o in range(len):
+            if str != "":
+                str += " "
+            v = bytes(info.read_memory(1, o))[0]
+            if sys.version_info[0] < 3:
+                v = struct.unpack("<B", v)
+            str += "0x%02x" % v
+        text = result.string + "\t## bytes = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalAddrDisassembler(TestDisassembler):
+    """Check the gdb.format_address method."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        arch = info.architecture
+        addr = info.address
+        program_space = info.progspace
+        str = gdb.format_address(addr, program_space, arch)
+        text = result.string + "\t## addr = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class NonMemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a non-memory error instead of performing any disassembly."""
+
+    def disassemble(self, info):
+        raise gdb.GdbError("non-memory error instead of a result")
+
+
+class NonMemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a non-memory error after calling the builtin disassembler."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        raise gdb.GdbError("non-memory error after builtin disassembler")
+
+
+class MemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a memory error, ignore the error and disassemble."""
+
+    def disassemble(self, info):
+        tag = "## FAIL"
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError:
+            tag = "## AFTER ERROR"
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t" + tag
+        return DisassemblerResult(result.length, text)
+
+
+class MemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a memory error after calling the builtin disassembler, but
+    before we return a result."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        # The following read will throw an error.
+        info.read_memory(1, -info.address + 2)
+        return DisassemblerResult(1, "BAD")
+
+
+class RethrowMemoryErrorDisassembler(TestDisassembler):
+    """Catch and rethrow a memory error."""
+
+    def disassemble(self, info):
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError as e:
+            raise gdb.MemoryError("cannot read code at address 0x2")
+        return DisassemblerResult(1, "BAD")
+
+
+class ResultOfWrongType(TestDisassembler):
+    """Return something that is not a DisassemblerResult from disassemble method"""
+
+    class Blah:
+        def __init__(self, length, string):
+            self.length = length
+            self.string = string
+
+    def disassemble(self, info):
+        return self.Blah(1, "ABC")
+
+
+class ResultWrapper(gdb.disassembler.DisassemblerResult):
+    def __init__(self, length, string, length_x=None, string_x=None):
+        super().__init__(length, string)
+        if length_x is None:
+            self.__length = length
+        else:
+            self.__length = length_x
+        if string_x is None:
+            self.__string = string
+        else:
+            self.__string = string_x
+
+    @property
+    def length(self):
+        return self.__length
+
+    @property
+    def string(self):
+        return self.__string
+
+
+class ResultWithInvalidLength(TestDisassembler):
+    """Return a result object with an invalid length."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        return ResultWrapper(result.length, result.string, 0)
+
+
+class ResultWithInvalidString(TestDisassembler):
+    """Return a result object with an empty string."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        return ResultWrapper(result.length, result.string, None, "")
+
+
+class TaggingDisassembler(TestDisassembler):
+    """A simple disassembler that just tags the output."""
+
+    def __init__(self, tag):
+        super().__init__()
+        self._tag = tag
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## tag = %s" % self._tag
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalCachingDisassembler(TestDisassembler):
+    """A disassembler that caches the DisassembleInfo that is passed in,
+    as well as a copy of the original DisassembleInfo.
+
+    Once the call into the disassembler is complete then the
+    DisassembleInfo objects become invalid, and any calls into them
+    should trigger an exception."""
+
+    # This is where we cache the DisassembleInfo objects.
+    cached_insn_disas = []
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+    def disassemble(self, info):
+        """Disassemble the instruction, add a CACHED comment to the output,
+        and cache the DisassembleInfo so that it is not garbage collected."""
+        GlobalCachingDisassembler.cached_insn_disas.append (info)
+        GlobalCachingDisassembler.cached_insn_disas.append (self.MyInfo(info))
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## CACHED"
+        return DisassemblerResult(result.length, text)
+
+    @staticmethod
+    def check():
+        """Check that all of the methods on the cached DisassembleInfo trigger an
+        exception."""
+        for info in GlobalCachingDisassembler.cached_insn_disas:
+            assert isinstance(info, gdb.disassembler.DisassembleInfo)
+            assert not info.is_valid()
+            try:
+                val = info.address
+                raise gdb.GdbError("DisassembleInfo.address is still valid")
+            except RuntimeError as e:
+                assert str(e) == "DisassembleInfo is no longer valid."
+            except:
+                raise gdb.GdbError("DisassembleInfo.address raised an unexpected exception")
+
+            try:
+                val = info.architecture
+                raise gdb.GdbError("DisassembleInfo.architecture is still valid")
+            except RuntimeError as e:
+                assert str(e) == "DisassembleInfo is no longer valid."
+            except:
+                raise gdb.GdbError(
+                    "DisassembleInfo.architecture raised an unexpected exception"
+                )
+
+            try:
+                val = info.read_memory(1, 0)
+                raise gdb.GdbError("DisassembleInfo.read is still valid")
+            except RuntimeError as e:
+                assert str(e) == "DisassembleInfo is no longer valid."
+            except:
+                raise gdb.GdbError("DisassembleInfo.read raised an unexpected exception")
+
+        print("PASS")
+
+
+class GlobalNullDisassembler(TestDisassembler):
+    """A disassembler that does not change the output at all."""
+
+    def disassemble(self, info):
+        pass
+
+
+class FaultingMemorySourceDisassembler(TestDisassembler):
+    """Throw a memory error from the memory source read_memory method."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            # Throw a memory error with a specific address.  We don't
+            # expect this address to show up in the output though.
+            raise gdb.MemoryError(0x1234)
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class FailingMemorySourceDisassembler(TestDisassembler):
+    """Throw a non-memory error from the memory source read_memory method."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            raise gdb.GdbError("the memory source failed")
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class MemorySourceNotABufferDisassembler(TestDisassembler):
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            return 1234
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class MemorySourceBufferTooLongDisassembler(TestDisassembler):
+    """The read memory returns too many bytes."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            buffer = super().read_memory(length, offset)
+            # Create a new memory view made by duplicating BUFFER.  This
+            # will trigger an error as GDB expects a buffer of exactly
+            # LENGTH to be returned, while this will return a buffer of
+            # 2*LENGTH.
+            return memoryview(
+                bytes([int.from_bytes(x, "little") for x in (list(buffer[0:]) * 2)])
+            )
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class BuiltinDisassembler(Disassembler):
+    """Just calls the builtin disassembler."""
+
+    def __init__(self):
+        super().__init__("BuiltinDisassembler")
+
+    def __call__(self, info):
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class AnalyzingDisassembler(Disassembler):
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        """Wrapper around builtin DisassembleInfo type that overrides the
+        read_memory method."""
+
+        def __init__(self, info, start, end, nop_bytes):
+            """INFO is the DisassembleInfo we are wrapping.  START and END are
+            addresses, and NOP_BYTES should be a memoryview object.
+
+            The length (END - START) should be the same as the length
+            of NOP_BYTES.
+
+            Any memory read requests outside the START->END range are
+            serviced normally, but any attempt to read within the
+            START->END range will return content from NOP_BYTES."""
+            super().__init__(info)
+            self._start = start
+            self._end = end
+            self._nop_bytes = nop_bytes
+
+        def _read_replacement(self, length, offset):
+            """Return a slice of the buffer representing the replacement nop
+            instructions."""
+
+            assert self._nop_bytes is not None
+            rb = self._nop_bytes
+
+            # If this request is outside of a nop instruction then we don't know
+            # what to do, so just raise a memory error.
+            if offset >= len(rb) or (offset + length) > len(rb):
+                raise gdb.MemoryError("invalid length and offset combination")
+
+            # Return only the slice of the nop instruction as requested.
+            s = offset
+            e = offset + length
+            return rb[s:e]
+
+        def read_memory(self, length, offset=0):
+            """Callback used by the builtin disassembler to read the contents of
+            memory."""
+
+            # If this request is within the region we are replacing with 'nop'
+            # instructions, then call the helper function to perform that
+            # replacement.
+            if self._start is not None:
+                assert self._end is not None
+                if self.address >= self._start and self.address < self._end:
+                    return self._read_replacement(length, offset)
+
+            # Otherwise, we just forward this request to the default read memory
+            # implementation.
+            return super().read_memory(length, offset)
+
+    def __init__(self):
+        """Constructor."""
+        super().__init__("AnalyzingDisassembler")
+
+        # Details about the instructions found during the first disassembler
+        # pass.
+        self._pass_1_length = []
+        self._pass_1_insn = []
+        self._pass_1_address = []
+
+        # The start and end address for the instruction we will replace with
+        # one or more 'nop' instructions during pass two.
+        self._start = None
+        self._end = None
+
+        # The index in the _pass_1_* lists for where the nop instruction can
+        # be found, also, the buffer of bytes that make up a nop instruction.
+        self._nop_index = None
+        self._nop_bytes = None
+
+        # A flag that indicates if we are in the first or second pass of
+        # this disassembler test.
+        self._first_pass = True
+
+        # The disassembled instructions collected during the second pass.
+        self._pass_2_insn = []
+
+        # A copy of _pass_1_insn that has been modified to include the extra
+        # 'nop' instructions we plan to insert during the second pass.  This
+        # is then checked against _pass_2_insn after the second disassembler
+        # pass has completed.
+        self._check = []
+
+    def __call__(self, info):
+        """Called to perform the disassembly."""
+
+        # Override the info object, this provides access to our
+        # read_memory function.
+        info = self.MyInfo(info, self._start, self._end, self._nop_bytes)
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        # Record some informaiton about the first 'nop' instruction we find.
+        if self._nop_index is None and result.string == "nop":
+            self._nop_index = len(self._pass_1_length)
+            # The offset in the following read_memory call defaults to 0.
+            print("APB: Reading nop bytes")
+            self._nop_bytes = info.read_memory(result.length)
+
+        # Record information about each instruction that is disassembled.
+        # This test is performed in two passes, and we need different
+        # information in each pass.
+        if self._first_pass:
+            self._pass_1_length.append(result.length)
+            self._pass_1_insn.append(result.string)
+            self._pass_1_address.append(info.address)
+        else:
+            self._pass_2_insn.append(result.string)
+
+        return result
+
+    def find_replacement_candidate(self):
+        """Call this after the first disassembly pass.  This identifies a suitable
+        instruction to replace with 'nop' instruction(s)."""
+
+        if self._nop_index is None:
+            raise gdb.GdbError("no nop was found")
+
+        nop_idx = self._nop_index
+        nop_length = self._pass_1_length[nop_idx]
+
+        # First we look for an instruction that is larger than a nop
+        # instruction, but whose length is an exact multiple of the nop
+        # instruction's length.
+        replace_idx = None
+        for idx in range(len(self._pass_1_length)):
+            if (
+                idx > 0
+                and idx != nop_idx
+                and self._pass_1_insn[idx] != "nop"
+                and self._pass_1_length[idx] > self._pass_1_length[nop_idx]
+                and self._pass_1_length[idx] % self._pass_1_length[nop_idx] == 0
+            ):
+                replace_idx = idx
+                break
+
+        # If we still don't have a replacement candidate, then search again,
+        # this time looking for an instruciton that is the same length as a
+        # nop instruction.
+        if replace_idx is None:
+            for idx in range(len(self._pass_1_length)):
+                if (
+                    idx > 0
+                    and idx != nop_idx
+                    and self._pass_1_insn[idx] != "nop"
+                    and self._pass_1_length[idx] == self._pass_1_length[nop_idx]
+                ):
+                    replace_idx = idx
+                    break
+
+        # Weird, the nop instruction must be larger than every other
+        # instruction, or all instructions are 'nop'?
+        if replace_idx is None:
+            raise gdb.GdbError("can't find an instruction to replace")
+
+        # Record the instruction range that will be replaced with 'nop'
+        # instructions, and mark that we are now on the second pass.
+        self._start = self._pass_1_address[replace_idx]
+        self._end = self._pass_1_address[replace_idx] + self._pass_1_length[replace_idx]
+        self._first_pass = False
+        print("Replace from 0x%x to 0x%x with NOP" % (self._start, self._end))
+
+        # Finally, build the expected result.  Create the _check list, which
+        # is a copy of _pass_1_insn, but replace the instruction we
+        # identified above with a series of 'nop' instructions.
+        self._check = list(self._pass_1_insn)
+        nop_count = int(self._pass_1_length[replace_idx] / self._pass_1_length[nop_idx])
+        nops = ["nop"] * nop_count
+        self._check[replace_idx : (replace_idx + 1)] = nops
+
+    def check(self):
+        """Call this after the second disassembler pass to validate the output."""
+        if self._check != self._pass_2_insn:
+            print("APB, Check : %s" % self._check)
+            print("APB, Result: %s" % self._pass_2_insn)
+            raise gdb.GdbError("mismatch")
+        print("PASS")
+
+
+def add_global_disassembler(dis_class):
+    """Create an instance of DIS_CLASS and register it as a global disassembler."""
+    dis = dis_class()
+    gdb.disassembler.register_disassembler(dis, None)
+    return dis
+
+
+class InvalidDisassembleInfo(gdb.disassembler.DisassembleInfo):
+    """An attempt to create a DisassembleInfo sub-class without calling
+    the parent class init method.
+
+    Attempts to use instances of this class should throw an error
+    saying that the DisassembleInfo is not valid, despite this class
+    having all of the required attributes.
+
+    The reason why this class will never be valid is that an internal
+    field (within the C++ code) can't be initialized without calling
+    the parent class init method."""
+
+    def __init__(self):
+        assert current_pc is not None
+
+    def is_valid(self):
+        return True
+
+    @property
+    def address(self):
+        global current_pc
+        return current_pc
+
+    @property
+    def architecture(self):
+        return gdb.selected_inferior().architecture()
+
+    @property
+    def progspace(self):
+        return gdb.selected_inferior().progspace
+
+
+# Start with all disassemblers removed.
+remove_all_python_disassemblers()
+
+print("Python script imported")
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv5 4/5] gdb: refactor the non-printing disassemblers
  2022-05-06 17:17       ` [PATCHv5 " Andrew Burgess
                           ` (2 preceding siblings ...)
  2022-05-06 17:17         ` [PATCHv5 3/5] gdb/python: implement the print_insn extension language hook Andrew Burgess
@ 2022-05-06 17:17         ` Andrew Burgess
  2022-05-06 17:17         ` [PATCHv5 5/5] gdb: unify two dis_asm_read_memory functions in disasm.c Andrew Burgess
  2022-05-25 10:49         ` [PATCHv6 0/6] Add Python API for the disassembler Andrew Burgess
  5 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-05-06 17:17 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

This commit started from an observation I made while working on some
other disassembler patches, that is, that the function
gdb_buffered_insn_length, is broken ... sort of.

I noticed that the gdb_buffered_insn_length function doesn't set up
the application data field if the disassemble_info structure.

Further, I noticed that some architectures, for example, ARM, require
that the application_data field be set, see gdb_print_insn_arm in
arm-tdep.c.

And so, if we ever use gdb_buffered_insn_length for ARM, then GDB will
likely crash.  Which is why I said only "sort of" broken.  Right now
we don't use gdb_buffered_insn_length with ARM, so maybe it isn't
broken yet?

Anyway to prove to myself that there was a problem here I extended the
disassembler self tests in disasm-selftests.c to include a test of
gdb_buffered_insn_length.  As I run the test for all architectures, I
do indeed see GDB crash for ARM.

To fix this we need gdb_buffered_insn_length to create a disassembler
that inherits from gdb_disassemble_info, but we also need this new
disassembler to not print anything.

And so, I introduce a new gdb_non_printing_disassembler class, this is
a disassembler that doesn't print anything to the output stream.

I then observed that both ARC and S12Z also create non-printing
disassemblers, but these are slightly different.  While the
disassembler in gdb_non_printing_disassembler reads the instruction
from a buffer, the ARC and S12Z disassemblers read from target memory
using target_read_code.

And so, I further split gdb_non_printing_disassembler into two
sub-classes, gdb_non_printing_memory_disassembler and
gdb_non_printing_buffer_disassembler.

The new selftests now pass, but otherwise, there should be no user
visible changes after this commit.
---
 gdb/arc-linux-tdep.c   | 15 +++----
 gdb/arc-tdep.c         | 29 +++-----------
 gdb/arc-tdep.h         |  5 ---
 gdb/disasm-selftests.c | 70 ++++++++++++++++++++++++++-------
 gdb/disasm.c           | 88 ++++++++++++++++++------------------------
 gdb/disasm.h           | 56 ++++++++++++++++++++++++---
 gdb/s12z-tdep.c        | 26 +------------
 7 files changed, 158 insertions(+), 131 deletions(-)

diff --git a/gdb/arc-linux-tdep.c b/gdb/arc-linux-tdep.c
index 13595f2e8e9..04ca38f1355 100644
--- a/gdb/arc-linux-tdep.c
+++ b/gdb/arc-linux-tdep.c
@@ -356,7 +356,7 @@ arc_linux_sw_breakpoint_from_kind (struct gdbarch *gdbarch,
    */
 
 static std::vector<CORE_ADDR>
-handle_atomic_sequence (arc_instruction insn, disassemble_info &di)
+handle_atomic_sequence (arc_instruction insn, disassemble_info *di)
 {
   const int atomic_seq_len = 24;    /* Instruction sequence length.  */
   std::vector<CORE_ADDR> next_pcs;
@@ -374,7 +374,7 @@ handle_atomic_sequence (arc_instruction insn, disassemble_info &di)
   for (int insn_count = 0; insn_count < atomic_seq_len; ++insn_count)
     {
       arc_insn_decode (arc_insn_get_linear_next_pc (insn),
-		       &di, arc_delayed_print_insn, &insn);
+		       di, arc_delayed_print_insn, &insn);
 
       if (insn.insn_class == BRCC)
         {
@@ -412,15 +412,15 @@ arc_linux_software_single_step (struct regcache *regcache)
 {
   struct gdbarch *gdbarch = regcache->arch ();
   arc_gdbarch_tdep *tdep = (arc_gdbarch_tdep *) gdbarch_tdep (gdbarch);
-  struct disassemble_info di = arc_disassemble_info (gdbarch);
+  struct gdb_non_printing_memory_disassembler dis (gdbarch);
 
   /* Read current instruction.  */
   struct arc_instruction curr_insn;
-  arc_insn_decode (regcache_read_pc (regcache), &di, arc_delayed_print_insn,
-		   &curr_insn);
+  arc_insn_decode (regcache_read_pc (regcache), dis.disasm_info (),
+		   arc_delayed_print_insn, &curr_insn);
 
   if (curr_insn.insn_class == LLOCK)
-    return handle_atomic_sequence (curr_insn, di);
+    return handle_atomic_sequence (curr_insn, dis.disasm_info ());
 
   CORE_ADDR next_pc = arc_insn_get_linear_next_pc (curr_insn);
   std::vector<CORE_ADDR> next_pcs;
@@ -431,7 +431,8 @@ arc_linux_software_single_step (struct regcache *regcache)
   if (curr_insn.has_delay_slot)
     {
       struct arc_instruction next_insn;
-      arc_insn_decode (next_pc, &di, arc_delayed_print_insn, &next_insn);
+      arc_insn_decode (next_pc, dis.disasm_info (), arc_delayed_print_insn,
+		       &next_insn);
       next_pcs.push_back (arc_insn_get_linear_next_pc (next_insn));
     }
   else
diff --git a/gdb/arc-tdep.c b/gdb/arc-tdep.c
index 98bd1c4bc0a..75fd3077ca7 100644
--- a/gdb/arc-tdep.c
+++ b/gdb/arc-tdep.c
@@ -1306,24 +1306,6 @@ arc_is_in_prologue (struct gdbarch *gdbarch, const struct arc_instruction &insn,
   return false;
 }
 
-/* See arc-tdep.h.  */
-
-struct disassemble_info
-arc_disassemble_info (struct gdbarch *gdbarch)
-{
-  struct disassemble_info di;
-  init_disassemble_info_for_no_printing (&di);
-  di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
-  di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
-  di.endian = gdbarch_byte_order (gdbarch);
-  di.read_memory_func = [](bfd_vma memaddr, gdb_byte *myaddr,
-			   unsigned int len, struct disassemble_info *info)
-    {
-      return target_read_code (memaddr, myaddr, len);
-    };
-  return di;
-}
-
 /* Analyze the prologue and update the corresponding frame cache for the frame
    unwinder for unwinding frames that doesn't have debug info.  In such
    situation GDB attempts to parse instructions in the prologue to understand
@@ -1394,9 +1376,10 @@ arc_analyze_prologue (struct gdbarch *gdbarch, const CORE_ADDR entrypoint,
   while (current_prologue_end < limit_pc)
     {
       struct arc_instruction insn;
-      struct disassemble_info di = arc_disassemble_info (gdbarch);
-      arc_insn_decode (current_prologue_end, &di, arc_delayed_print_insn,
-		       &insn);
+
+      struct gdb_non_printing_memory_disassembler dis (gdbarch);
+      arc_insn_decode (current_prologue_end, dis.disasm_info (),
+		       arc_delayed_print_insn, &insn);
 
       if (arc_debug)
 	arc_insn_dump (insn);
@@ -2460,8 +2443,8 @@ dump_arc_instruction_command (const char *args, int from_tty)
 
   CORE_ADDR address = value_as_address (val);
   struct arc_instruction insn;
-  struct disassemble_info di = arc_disassemble_info (target_gdbarch ());
-  arc_insn_decode (address, &di, arc_delayed_print_insn, &insn);
+  struct gdb_non_printing_memory_disassembler dis (target_gdbarch ());
+  arc_insn_decode (address, dis.disasm_info (), arc_delayed_print_insn, &insn);
   arc_insn_dump (insn);
 }
 
diff --git a/gdb/arc-tdep.h b/gdb/arc-tdep.h
index ceca003204f..53e5d8476fc 100644
--- a/gdb/arc-tdep.h
+++ b/gdb/arc-tdep.h
@@ -186,11 +186,6 @@ arc_arch_is_em (const struct bfd_arch_info* arch)
    can't be set to an actual NULL value - that would cause a crash.  */
 int arc_delayed_print_insn (bfd_vma addr, struct disassemble_info *info);
 
-/* Return properly initialized disassemble_info for ARC disassembler - it will
-   not print disassembled instructions to stderr.  */
-
-struct disassemble_info arc_disassemble_info (struct gdbarch *gdbarch);
-
 /* Get branch/jump target address for the INSN.  Note that this function
    returns branch target and doesn't evaluate if this branch is taken or not.
    For the indirect jumps value depends in register state, hence can change.
diff --git a/gdb/disasm-selftests.c b/gdb/disasm-selftests.c
index 928d26f7018..07586f04abd 100644
--- a/gdb/disasm-selftests.c
+++ b/gdb/disasm-selftests.c
@@ -25,13 +25,19 @@
 
 namespace selftests {
 
-/* Test disassembly of one instruction.  */
+/* Return a pointer to a buffer containing an instruction that can be
+   disassembled for architecture GDBARCH.  *LEN will be set to the length
+   of the returned buffer.
 
-static void
-print_one_insn_test (struct gdbarch *gdbarch)
+   If there's no known instruction to disassemble for GDBARCH (because we
+   haven't figured on out, not because no instructions exist) then nullptr
+   is returned, and *LEN is set to 0.  */
+
+static const gdb_byte *
+get_test_insn (struct gdbarch *gdbarch, size_t *len)
 {
-  size_t len = 0;
-  const gdb_byte *insn = NULL;
+  *len = 0;
+  const gdb_byte *insn = nullptr;
 
   switch (gdbarch_bfd_arch_info (gdbarch)->arch)
     {
@@ -40,34 +46,34 @@ print_one_insn_test (struct gdbarch *gdbarch)
       static const gdb_byte bfin_insn[] = {0x17, 0xe1, 0xff, 0xff};
 
       insn = bfin_insn;
-      len = sizeof (bfin_insn);
+      *len = sizeof (bfin_insn);
       break;
     case bfd_arch_arm:
       /* mov     r0, #0 */
       static const gdb_byte arm_insn[] = {0x0, 0x0, 0xa0, 0xe3};
 
       insn = arm_insn;
-      len = sizeof (arm_insn);
+      *len = sizeof (arm_insn);
       break;
     case bfd_arch_ia64:
     case bfd_arch_mep:
     case bfd_arch_mips:
     case bfd_arch_tic6x:
     case bfd_arch_xtensa:
-      return;
+      return insn;
     case bfd_arch_s390:
       /* nopr %r7 */
       static const gdb_byte s390_insn[] = {0x07, 0x07};
 
       insn = s390_insn;
-      len = sizeof (s390_insn);
+      *len = sizeof (s390_insn);
       break;
     case bfd_arch_xstormy16:
       /* nop */
       static const gdb_byte xstormy16_insn[] = {0x0, 0x0};
 
       insn = xstormy16_insn;
-      len = sizeof (xstormy16_insn);
+      *len = sizeof (xstormy16_insn);
       break;
     case bfd_arch_nios2:
     case bfd_arch_score:
@@ -78,13 +84,13 @@ print_one_insn_test (struct gdbarch *gdbarch)
       {
 	int bplen;
 	insn = gdbarch_sw_breakpoint_from_kind (gdbarch, 4, &bplen);
-	len = bplen;
+	*len = bplen;
       }
       break;
     case bfd_arch_arc:
       /* PR 21003 */
       if (gdbarch_bfd_arch_info (gdbarch)->mach == bfd_mach_arc_arc601)
-	return;
+	return insn;
       goto generic_case;
     case bfd_arch_i386:
       {
@@ -93,7 +99,7 @@ print_one_insn_test (struct gdbarch *gdbarch)
 	   opcodes rejects an attempt to disassemble for an arch with
 	   a 64-bit address size when bfd_vma is 32-bit.  */
 	if (info->bits_per_address > sizeof (bfd_vma) * CHAR_BIT)
-	  return;
+	  return insn;
       }
       /* fall through */
     default:
@@ -105,12 +111,26 @@ print_one_insn_test (struct gdbarch *gdbarch)
 	int bplen;
 
 	insn = gdbarch_sw_breakpoint_from_kind (gdbarch, kind, &bplen);
-	len = bplen;
+	*len = bplen;
 
 	break;
       }
     }
-  SELF_CHECK (len > 0);
+  SELF_CHECK (*len > 0);
+
+  return insn;
+}
+
+/* Test disassembly of one instruction.  */
+
+static void
+print_one_insn_test (struct gdbarch *gdbarch)
+{
+  size_t len;
+  const gdb_byte *insn = get_test_insn (gdbarch, &len);
+
+  if (insn == nullptr)
+    return;
 
   /* Test gdb_disassembler for a given gdbarch by reading data from a
      pre-allocated buffer.  If you want to see the disassembled
@@ -175,6 +195,24 @@ print_one_insn_test (struct gdbarch *gdbarch)
   SELF_CHECK (di.print_insn (0) == len);
 }
 
+/* Test the gdb_buffered_insn_length function.  */
+
+static void
+buffered_insn_length_test (struct gdbarch *gdbarch)
+{
+  size_t buf_len;
+  const gdb_byte *insn = get_test_insn (gdbarch, &buf_len);
+
+  if (insn == nullptr)
+    return;
+
+  CORE_ADDR insn_address = 0;
+  int calculated_len = gdb_buffered_insn_length (gdbarch, insn, buf_len,
+						 insn_address);
+
+  SELF_CHECK (calculated_len == buf_len);
+}
+
 /* Test disassembly on memory error.  */
 
 static void
@@ -235,4 +273,6 @@ _initialize_disasm_selftests ()
 					 selftests::print_one_insn_test);
   selftests::register_test_foreach_arch ("memory_error",
 					 selftests::memory_error_test);
+  selftests::register_test_foreach_arch ("buffered_insn_length",
+					 selftests::buffered_insn_length_test);
 }
diff --git a/gdb/disasm.c b/gdb/disasm.c
index 4af40c916b2..53cd6f5b6bb 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -1003,66 +1003,56 @@ gdb_insn_length (struct gdbarch *gdbarch, CORE_ADDR addr)
   return gdb_print_insn (gdbarch, addr, &null_stream, NULL);
 }
 
-/* An fprintf-function for use by the disassembler when we know we don't
-   want to print anything.  Always returns success.  */
+/* See disasm.h.  */
 
-static int ATTRIBUTE_PRINTF (2, 3)
-gdb_disasm_null_printf (void *stream, const char *format, ...)
+int
+gdb_non_printing_disassembler::null_fprintf_func (void *stream,
+						  const char *format, ...)
 {
   return 0;
 }
 
-/* An fprintf-function for use by the disassembler when we know we don't
-   want to print anything, and the disassembler is using style.  Always
-   returns success.  */
+/* See disasm.h.  */
 
-static int ATTRIBUTE_PRINTF (3, 4)
-gdb_disasm_null_styled_printf (void *stream,
-			       enum disassembler_style style,
-			       const char *format, ...)
+int
+gdb_non_printing_disassembler::null_fprintf_styled_func
+  (void *stream, enum disassembler_style style, const char *format, ...)
 {
   return 0;
 }
 
 /* See disasm.h.  */
 
-void
-init_disassemble_info_for_no_printing (struct disassemble_info *dinfo)
+int
+gdb_non_printing_memory_disassembler::dis_asm_read_memory
+  (bfd_vma memaddr, bfd_byte *myaddr, unsigned int length,
+   struct disassemble_info *dinfo)
 {
-  init_disassemble_info (dinfo, nullptr, gdb_disasm_null_printf,
-			 gdb_disasm_null_styled_printf);
+  return target_read_code (memaddr, myaddr, length);
 }
 
-/* Initialize a struct disassemble_info for gdb_buffered_insn_length.
-   Upon return, *DISASSEMBLER_OPTIONS_HOLDER owns the string pointed
-   to by DI.DISASSEMBLER_OPTIONS.  */
+/* A non-printing disassemble_info management class.  The disassemble_info
+   setup by this class will not print anything to the output stream (there
+   is no output stream), and the instruction to be disassembled will be
+   read from a buffer passed to the constructor.  */
 
-static void
-gdb_buffered_insn_length_init_dis (struct gdbarch *gdbarch,
-				   struct disassemble_info *di,
-				   const gdb_byte *insn, int max_len,
-				   CORE_ADDR addr,
-				   std::string *disassembler_options_holder)
+struct gdb_non_printing_buffer_disassembler
+  : public gdb_non_printing_disassembler
 {
-  init_disassemble_info_for_no_printing (di);
-
-  /* init_disassemble_info installs buffer_read_memory, etc.
-     so we don't need to do that here.
-     The cast is necessary until disassemble_info is const-ified.  */
-  di->buffer = (gdb_byte *) insn;
-  di->buffer_length = max_len;
-  di->buffer_vma = addr;
-
-  di->arch = gdbarch_bfd_arch_info (gdbarch)->arch;
-  di->mach = gdbarch_bfd_arch_info (gdbarch)->mach;
-  di->endian = gdbarch_byte_order (gdbarch);
-  di->endian_code = gdbarch_byte_order_for_code (gdbarch);
-
-  *disassembler_options_holder = get_all_disassembler_options (gdbarch);
-  if (!disassembler_options_holder->empty ())
-    di->disassembler_options = disassembler_options_holder->c_str ();
-  disassemble_init_for_target (di);
-}
+  /* Constructor.  GDBARCH is the architecture to disassemble for, BUFFER
+     contains the instruction to disassemble, and INSN_ADDRESS is the
+     address (in target memory) of the instruction to disassemble.  */
+  gdb_non_printing_buffer_disassembler (struct gdbarch *gdbarch,
+					gdb::array_view<const gdb_byte> buffer,
+					CORE_ADDR insn_address)
+    : gdb_non_printing_disassembler (gdbarch, nullptr)
+  {
+    /* The cast is necessary until disassemble_info is const-ified.  */
+    m_di.buffer = (gdb_byte *) buffer.data ();
+    m_di.buffer_length = buffer.size ();
+    m_di.buffer_vma = insn_address;
+  }
+};
 
 /* Return the length in bytes of INSN.  MAX_LEN is the size of the
    buffer containing INSN.  */
@@ -1071,14 +1061,10 @@ int
 gdb_buffered_insn_length (struct gdbarch *gdbarch,
 			  const gdb_byte *insn, int max_len, CORE_ADDR addr)
 {
-  struct disassemble_info di;
-  std::string disassembler_options_holder;
-
-  gdb_buffered_insn_length_init_dis (gdbarch, &di, insn, max_len, addr,
-				     &disassembler_options_holder);
-
-  int result = gdb_print_insn_1 (gdbarch, addr, &di);
-  disassemble_free_target (&di);
+  gdb::array_view<const gdb_byte> buffer
+    = gdb::make_array_view (insn, max_len);
+  gdb_non_printing_buffer_disassembler dis (gdbarch, buffer, addr);
+  int result = gdb_print_insn_1 (gdbarch, addr, dis.disasm_info ());
   return result;
 }
 
diff --git a/gdb/disasm.h b/gdb/disasm.h
index f31ca92b038..ec5120351a1 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -136,6 +136,56 @@ struct gdb_printing_disassembler : public gdb_disassemble_info
     ATTRIBUTE_PRINTF(3,4);
 };
 
+/* A basic disassembler that doesn't actually print anything.  */
+
+struct gdb_non_printing_disassembler : public gdb_disassemble_info
+{
+  gdb_non_printing_disassembler (struct gdbarch *gdbarch,
+				 read_memory_ftype read_memory_func)
+    : gdb_disassemble_info (gdbarch, nullptr /* stream */,
+			    read_memory_func,
+			    nullptr /* memory_error_func */,
+			    nullptr /* print_address_func */,
+			    null_fprintf_func,
+			    null_fprintf_styled_func)
+  { /* Nothing.  */ }
+
+private:
+
+  /* Callback used as the disassemble_info's fprintf_func callback, this
+     doesn't write anything to STREAM, but just returns 0.  */
+  static int null_fprintf_func (void *stream, const char *format, ...)
+    ATTRIBUTE_PRINTF(2,3);
+
+  /* Callback used as the disassemble_info's fprintf_styled_func callback,
+     , this doesn't write anything to STREAM, but just returns 0.  */
+  static int null_fprintf_styled_func (void *stream,
+				       enum disassembler_style style,
+				       const char *format, ...)
+    ATTRIBUTE_PRINTF(3,4);
+};
+
+/* A non-printing disassemble_info management class.  The disassemble_info
+   setup by this class will not print anything to the output stream (there
+   is no output stream), and the instruction to be disassembled will be
+   read from target memory.  */
+
+struct gdb_non_printing_memory_disassembler
+  : public gdb_non_printing_disassembler
+{
+  /* Constructor.  GDBARCH is the architecture to disassemble for.  */
+  gdb_non_printing_memory_disassembler (struct gdbarch *gdbarch)
+    :gdb_non_printing_disassembler (gdbarch, dis_asm_read_memory)
+  { /* Nothing.  */ }
+
+private:
+
+  /* Implements the read_memory_func disassemble_info callback.  */
+  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
+				  unsigned int len,
+				  struct disassemble_info *info);
+};
+
 /* A dissassembler class that provides 'print_insn', a method for
    disassembling a single instruction to the output stream.  */
 
@@ -278,10 +328,4 @@ extern char *get_disassembler_options (struct gdbarch *gdbarch);
 
 extern void set_disassembler_options (const char *options);
 
-/* Setup DINFO with its output function and output stream setup so that
-   nothing is printed while disassembling.  */
-
-extern void init_disassemble_info_for_no_printing
-  (struct disassemble_info *dinfo);
-
 #endif
diff --git a/gdb/s12z-tdep.c b/gdb/s12z-tdep.c
index 5394c1bbf5e..4e33faaea9a 100644
--- a/gdb/s12z-tdep.c
+++ b/gdb/s12z-tdep.c
@@ -141,27 +141,6 @@ s12z_dwarf_reg_to_regnum (struct gdbarch *gdbarch, int num)
 
 /* Support functions for frame handling.  */
 
-
-/* Return a disassemble_info initialized for s12z disassembly, however,
-   the disassembler will not actually print anything.  */
-
-static struct disassemble_info
-s12z_disassemble_info (struct gdbarch *gdbarch)
-{
-  struct disassemble_info di;
-  init_disassemble_info_for_no_printing (&di);
-  di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
-  di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
-  di.endian = gdbarch_byte_order (gdbarch);
-  di.read_memory_func = [](bfd_vma memaddr, gdb_byte *myaddr,
-			   unsigned int len, struct disassemble_info *info)
-    {
-      return target_read_code (memaddr, myaddr, len);
-    };
-  return di;
-}
-
-
 /* A struct (based on mem_read_abstraction_base) to read memory
    through the disassemble_info API.  */
 struct mem_read_abstraction
@@ -332,15 +311,14 @@ s12z_frame_cache (struct frame_info *this_frame, void **prologue_cache)
   int frame_size = 0;
   int saved_frame_size = 0;
 
-  struct disassemble_info di = s12z_disassemble_info (gdbarch);
-
+  struct gdb_non_printing_memory_disassembler dis (gdbarch);
 
   struct mem_read_abstraction mra;
   mra.base.read = (int (*)(mem_read_abstraction_base*,
 			   int, size_t, bfd_byte*)) abstract_read_memory;
   mra.base.advance = advance ;
   mra.base.posn = posn;
-  mra.info = &di;
+  mra.info = dis.disasm_info ();
 
   while (this_pc > addr)
     {
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv5 5/5] gdb: unify two dis_asm_read_memory functions in disasm.c
  2022-05-06 17:17       ` [PATCHv5 " Andrew Burgess
                           ` (3 preceding siblings ...)
  2022-05-06 17:17         ` [PATCHv5 4/5] gdb: refactor the non-printing disassemblers Andrew Burgess
@ 2022-05-06 17:17         ` Andrew Burgess
  2022-05-25 10:49         ` [PATCHv6 0/6] Add Python API for the disassembler Andrew Burgess
  5 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-05-06 17:17 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

After the recent restructuring of the disassembler code, GDB has ended
up with two identical class static functions, both called
dis_asm_read_memory, with identical implementations.

My first thought was to move these out of their respective classes,
and just make them global functions, then I'd only need a single
copy.

And maybe that's the right way to go.  But I disliked that by doing
that I loose the encapsulation of the method with the corresponding
disassembler class.

So, instead, I placed the static method into its own class, and had
both the gdb_non_printing_memory_disassembler and gdb_disassembler
classes inherit from this new class as an additional base-class.

In terms of code generated, I don't think there's any significant
difference with this approach, but I think this better reflects how
the function is closely tied to the disassembler.

There should be no user visible changes after this commit.
---
 gdb/disasm.c | 16 +++-------------
 gdb/disasm.h | 29 +++++++++++++++++------------
 2 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/gdb/disasm.c b/gdb/disasm.c
index 53cd6f5b6bb..c6edc92930d 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -132,9 +132,9 @@ line_has_code_p (htab_t table, struct symtab *symtab, int line)
 /* Wrapper of target_read_code.  */
 
 int
-gdb_disassembler::dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
-				       unsigned int len,
-				       struct disassemble_info *info)
+gdb_disassembler_memory_reader::dis_asm_read_memory
+  (bfd_vma memaddr, gdb_byte *myaddr, unsigned int len,
+   struct disassemble_info *info)
 {
   return target_read_code (memaddr, myaddr, len);
 }
@@ -1021,16 +1021,6 @@ gdb_non_printing_disassembler::null_fprintf_styled_func
   return 0;
 }
 
-/* See disasm.h.  */
-
-int
-gdb_non_printing_memory_disassembler::dis_asm_read_memory
-  (bfd_vma memaddr, bfd_byte *myaddr, unsigned int length,
-   struct disassemble_info *dinfo)
-{
-  return target_read_code (memaddr, myaddr, length);
-}
-
 /* A non-printing disassemble_info management class.  The disassemble_info
    setup by this class will not print anything to the output stream (there
    is no output stream), and the instruction to be disassembled will be
diff --git a/gdb/disasm.h b/gdb/disasm.h
index ec5120351a1..da03e130526 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -165,31 +165,39 @@ struct gdb_non_printing_disassembler : public gdb_disassemble_info
     ATTRIBUTE_PRINTF(3,4);
 };
 
+/* This is a helper class, for use as an additional base-class, by some of
+   the disassembler classes below.  This class just defines a static method
+   for reading from target memory, which can then be used by the various
+   disassembler sub-classes.  */
+
+struct gdb_disassembler_memory_reader
+{
+  /* Implements the read_memory_func disassemble_info callback.  */
+  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
+				  unsigned int len,
+				  struct disassemble_info *info);
+};
+
 /* A non-printing disassemble_info management class.  The disassemble_info
    setup by this class will not print anything to the output stream (there
    is no output stream), and the instruction to be disassembled will be
    read from target memory.  */
 
 struct gdb_non_printing_memory_disassembler
-  : public gdb_non_printing_disassembler
+  : public gdb_non_printing_disassembler,
+    private gdb_disassembler_memory_reader
 {
   /* Constructor.  GDBARCH is the architecture to disassemble for.  */
   gdb_non_printing_memory_disassembler (struct gdbarch *gdbarch)
     :gdb_non_printing_disassembler (gdbarch, dis_asm_read_memory)
   { /* Nothing.  */ }
-
-private:
-
-  /* Implements the read_memory_func disassemble_info callback.  */
-  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
-				  unsigned int len,
-				  struct disassemble_info *info);
 };
 
 /* A dissassembler class that provides 'print_insn', a method for
    disassembling a single instruction to the output stream.  */
 
-struct gdb_disassembler : public gdb_printing_disassembler
+struct gdb_disassembler : public gdb_printing_disassembler,
+			  private gdb_disassembler_memory_reader
 {
   gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
     : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
@@ -239,9 +247,6 @@ struct gdb_disassembler : public gdb_printing_disassembler
      (currently just to addresses and symbols) as it goes.  */
   static bool use_ext_lang_colorization_p;
 
-  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
-				  unsigned int len,
-				  struct disassemble_info *info);
   static void dis_asm_memory_error (int err, bfd_vma memaddr,
 				    struct disassemble_info *info);
   static void dis_asm_print_address (bfd_vma addr,
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv5 3/5] gdb/python: implement the print_insn extension language hook
  2022-05-06 17:17         ` [PATCHv5 3/5] gdb/python: implement the print_insn extension language hook Andrew Burgess
@ 2022-05-06 18:11           ` Eli Zaretskii
  2022-05-18 10:08             ` Andrew Burgess
  0 siblings, 1 reply; 80+ messages in thread
From: Eli Zaretskii @ 2022-05-06 18:11 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

> Date: Fri,  6 May 2022 18:17:12 +0100
> From: Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org>
> Cc: Andrew Burgess <andrew.burgess@embecosm.com>
> 
> diff --git a/gdb/NEWS b/gdb/NEWS
> index 982f4a1a18c..ddbaff51f89 100644
> --- a/gdb/NEWS
> +++ b/gdb/NEWS
> @@ -38,6 +38,40 @@ maintenance info line-table
>       This is the same format that GDB uses when printing address, symbol,
>       and offset information from the disassembler.
>  
> +  ** New Python API for wrapping GDB's disassembler:
> +
> +     - gdb.disassembler.register_disassembler(DISASSEMBLER, ARCH).
> +       DISASSEMBLER is a sub-class of gdb.disassembler.Disassembler.
> +       ARCH is either None or a string containing a bfd architecture
> +       name.  DISASSEMBLER is registered as a disassembler for
> +       architecture ARCH, or for all architectures if ARCH is None.
> +       The previous disassembler registered for ARCH is returned, this
> +       can be None if no previous disassembler was registered.
> +
> +     - gdb.disassembler.Disassembler is the class from which all
> +       disassemblers should inherit.  Its constructor takes a string,
> +       a name for the disassembler, which is currently only used in
> +       some debug output.  Sub-classes should override the __call__
> +       method to perform disassembly, invoking __call__ on this base
> +       class will raise an exception.
> +
> +     - gdb.disassembler.DisassembleInfo is the class used to describe
> +       a single disassembly request from GDB.  An instance of this
> +       class is passed to the __call__ method of
> +       gdb.disassembler.Disassembler and has the following read-only
> +       attributes: 'address', and 'architecture', as well as the
> +       following method: 'read_memory'.
> +
> +     - gdb.disassembler.builtin_disassemble(INFO, MEMORY_SOURCE),
> +       calls GDB's builtin disassembler on INFO, which is a
> +       gdb.disassembler.DisassembleInfo object.  MEMORY_SOURCE is
> +       optional, its default value is None.  If MEMORY_SOURCE is not
> +       None then it must be an object that has a 'read_memory' method.
> +
> +     - gdb.disassembler.DisassemblerResult is a class that can be used
> +       to wrap the result of a call to a Disassembler.  It has
> +       read-only attributes 'length' and 'string'.
> +
>  *** Changes in GDB 12

This part is OK.

> +@smallexample
> +(@value{GDBP}) show architecture
> +The target architecture is set to "auto" (currently "i386").
> +(@value{GDBP}) maint info python-disassemblers
> +Architecture        Disassember Name
> +i386                Disassembler_1	(Matches current architecture)
> +GLOBAL              Disassembler_2
> +(@value{GDBP}) set architecture arm
> +The target architecture is set to "arm".
> +(@value{GDBP}) maint info python-disassemblers
> +quit
> +Architecture        Disassember Name
> +i386                Disassembler_1
> +GLOBAL              Disassembler_2	(Matches current architecture)
> +@end smallexample

It is best to subdivide this long example into at least 2 groups using
@group..@end group.  This way, the example will not be broken between
2 pages in some arbitrary place.  My suggestion is to do this:

  +@smallexample
  + group
  +(@value{GDBP}) show architecture
  +The target architecture is set to "auto" (currently "i386").
  +(@value{GDBP}) maint info python-disassemblers
  +Architecture        Disassember Name
  +i386                Disassembler_1	(Matches current architecture)
  +GLOBAL              Disassembler_2
  +@end group
  +@group
  +(@value{GDBP}) set architecture arm
  +The target architecture is set to "arm".
  +(@value{GDBP}) maint info python-disassemblers
  +quit
  +Architecture        Disassember Name
  +i386                Disassembler_1
  +GLOBAL              Disassembler_2	(Matches current architecture)
  +@end group
  +@end smallexample

> +Consider, for example, and architecture with 2-byte and 4-byte
                          ^^^
I think you meant "an" there.

> +order to establish if the instruction is 2 or 4 bytes long.  If the
> +instruction is 4-bytes long then the disassembler might then read the
                               ^^^^                        ^^^^
Two "then" in a row; one of them is redundant.

> +@defun DisassemblerResult.__init__ (@var{length}, @var{string})
> +Initialise an instance of this class, @var{length} is the length of
   ^^^^^^^^^^
"Initialize".  We use the US spelling.

> +To see which disassemblers have been registered the @kbd{maint info
> +python-disassemblers} command can be used (@pxref{maint info
> +python-disassemblers}).

It is better to use active tense: "you can use ... to see which
disassemblers ...".

Thanks.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv5 3/5] gdb/python: implement the print_insn extension language hook
  2022-05-06 18:11           ` Eli Zaretskii
@ 2022-05-18 10:08             ` Andrew Burgess
  2022-05-18 12:08               ` Eli Zaretskii
  0 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2022-05-18 10:08 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb-patches


Eli,

Thanks for all your reviews of this patch series.  Really appreciate all
you work.

I've updated the patch to address all your feedback.

Thanks,
Andrew

---

commit aa6f556cd63f40ec194091a04306250badafd87b
Author: Andrew Burgess <andrew.burgess@embecosm.com>
Date:   Fri Sep 17 18:12:34 2021 +0100

    gdb/python: implement the print_insn extension language hook
    
    This commit extends the Python API to include disassembler support.
    
    The motivation for this commit was to provide an API by which the user
    could write Python scripts that would augment the output of the
    disassembler.
    
    To achieve this I have followed the model of the existing libopcodes
    disassembler, that is, instructions are disassembled one by one.  This
    does restrict the type of things that it is possible to do from a
    Python script, i.e. all additional output has to fit on a single line,
    but this was all I needed, and creating something more complex would,
    I think, require greater changes to how GDB's internal disassembler
    operates.
    
    The disassembler API is contained in the new gdb.disassembler module,
    which defines the following classes:
    
      DisassembleInfo
    
          Similar to libopcodes disassemble_info structure, has read-only
      properties: address, architecture, and progspace.  And has methods:
      __init__, read_memory, and is_valid.
    
          Each time GDB wants an instruction disassembled, an instance of
      this class is passed to a user written disassembler function, by
      reading the properties, and calling the methods (and other support
      methods in the gdb.disassembler module) the user can perform and
      return the disassembly.
    
      Disassembler
    
          This is a base-class which user written disassemblers should
      inherit from.  This base class provides base implementations of
      __init__ and __call__ which the user written disassembler should
      override.
    
      DisassemblerResult
    
          This class can be used to hold the result of a call to the
      disassembler, it's really just a wrapper around a string (the text
      of the disassembled instruction) and a length (in bytes).  The user
      can return an instance of this class from Disassembler.__call__ to
      represent the newly disassembled instruction.
    
    The gdb.disassembler module also provides the following functions:
    
      register_disassembler
    
          This function registers an instance of a Disassembler sub-class
      as a disassembler, either for one specific architecture, or, as a
      global disassembler for all architectures.
    
      builtin_disassemble
    
          This provides access to GDB's builtin disassembler.  A common
      use case that I see is augmenting the existing disassembler output.
      The user code can call this function to have GDB disassemble the
      instruction in the normal way.  The user gets back a
      DisassemblerResult object, which they can then read in order to
      augment the disassembler output in any way they wish.
    
          This function also provides a mechanism to intercept the
      disassemblers reads of memory, thus the user can adjust what GDB
      sees when it is disassembling.
    
    The included documentation provides a more detailed description of the
    API.
    
    There is also a new CLI command added:
    
      maint info python-disassemblers
    
    This command is defined in the Python gdb.disassemblers module, and
    can be used to list the currently registered Python disassemblers.

diff --git a/gdb/Makefile.in b/gdb/Makefile.in
index 418094775a5..42a0ebb371b 100644
--- a/gdb/Makefile.in
+++ b/gdb/Makefile.in
@@ -393,6 +393,7 @@ SUBDIR_PYTHON_SRCS = \
 	python/py-cmd.c \
 	python/py-connection.c \
 	python/py-continueevent.c \
+	python/py-disasm.c \
 	python/py-event.c \
 	python/py-evtregistry.c \
 	python/py-evts.c \
diff --git a/gdb/NEWS b/gdb/NEWS
index a72fee81550..f5ed294fe8f 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -41,6 +41,40 @@ maintenance info line-table
      This is the same format that GDB uses when printing address, symbol,
      and offset information from the disassembler.
 
+  ** New Python API for wrapping GDB's disassembler:
+
+     - gdb.disassembler.register_disassembler(DISASSEMBLER, ARCH).
+       DISASSEMBLER is a sub-class of gdb.disassembler.Disassembler.
+       ARCH is either None or a string containing a bfd architecture
+       name.  DISASSEMBLER is registered as a disassembler for
+       architecture ARCH, or for all architectures if ARCH is None.
+       The previous disassembler registered for ARCH is returned, this
+       can be None if no previous disassembler was registered.
+
+     - gdb.disassembler.Disassembler is the class from which all
+       disassemblers should inherit.  Its constructor takes a string,
+       a name for the disassembler, which is currently only used in
+       some debug output.  Sub-classes should override the __call__
+       method to perform disassembly, invoking __call__ on this base
+       class will raise an exception.
+
+     - gdb.disassembler.DisassembleInfo is the class used to describe
+       a single disassembly request from GDB.  An instance of this
+       class is passed to the __call__ method of
+       gdb.disassembler.Disassembler and has the following read-only
+       attributes: 'address', and 'architecture', as well as the
+       following method: 'read_memory'.
+
+     - gdb.disassembler.builtin_disassemble(INFO, MEMORY_SOURCE),
+       calls GDB's builtin disassembler on INFO, which is a
+       gdb.disassembler.DisassembleInfo object.  MEMORY_SOURCE is
+       optional, its default value is None.  If MEMORY_SOURCE is not
+       None then it must be an object that has a 'read_memory' method.
+
+     - gdb.disassembler.DisassemblerResult is a class that can be used
+       to wrap the result of a call to a Disassembler.  It has
+       read-only attributes 'length' and 'string'.
+
 *** Changes in GDB 12
 
 * DBX mode is deprecated, and will be removed in GDB 13
diff --git a/gdb/data-directory/Makefile.in b/gdb/data-directory/Makefile.in
index b606fc654b5..cf5226f3961 100644
--- a/gdb/data-directory/Makefile.in
+++ b/gdb/data-directory/Makefile.in
@@ -69,6 +69,7 @@ PYTHON_DIR = python
 PYTHON_INSTALL_DIR = $(DESTDIR)$(GDB_DATADIR)/$(PYTHON_DIR)
 PYTHON_FILE_LIST = \
 	gdb/__init__.py \
+	gdb/disassembler.py \
 	gdb/FrameDecorator.py \
 	gdb/FrameIterator.py \
 	gdb/frames.py \
diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
index 8cf97866ccc..9b9a05e4637 100644
--- a/gdb/doc/gdb.texinfo
+++ b/gdb/doc/gdb.texinfo
@@ -39539,6 +39539,51 @@
 @item maint info jit
 Print information about JIT code objects loaded in the current inferior.
 
+@anchor{maint info python-disassemblers}
+@kindex maint info python-disassemblers
+@item maint info python-disassemblers
+This command is defined within the @code{gdb.disassembler} Python
+module (@pxref{Disassembly In Python}), and will only be present after
+that module has been imported.  To force the module to be imported do
+the following:
+
+@smallexample
+(@value{GDBP}) python import gdb.disassembler
+@end smallexample
+
+This command lists all the architectures for which a disassembler is
+currently registered, and the name of the disassembler.  If a
+disassembler is registered for all architectures, then this is listed
+last against the @samp{GLOBAL} architecture.
+
+If one of the disassemblers would be selected for the architecture of
+the current inferior, then this disassembler will be marked.
+
+The following example shows a situation in which two disassemblers are
+registered, initially the @samp{i386} disassembler matches the current
+architecture, then the architecture is changed, now the @samp{GLOBAL}
+disassembler matches.
+
+@smallexample
+@group
+(@value{GDBP}) show architecture
+The target architecture is set to "auto" (currently "i386").
+(@value{GDBP}) maint info python-disassemblers
+Architecture        Disassember Name
+i386                Disassembler_1	(Matches current architecture)
+GLOBAL              Disassembler_2
+@end group
+@group
+(@value{GDBP}) set architecture arm
+The target architecture is set to "arm".
+(@value{GDBP}) maint info python-disassemblers
+quit
+Architecture        Disassember Name
+i386                Disassembler_1
+GLOBAL              Disassembler_2	(Matches current architecture)
+@end group
+@end smallexample
+
 @kindex set displaced-stepping
 @kindex show displaced-stepping
 @cindex displaced stepping support
diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
index cb5283e03c0..86a40e3333b 100644
--- a/gdb/doc/python.texi
+++ b/gdb/doc/python.texi
@@ -222,6 +222,7 @@
 * Registers In Python::         Python representation of registers.
 * Connections In Python::	Python representation of connections.
 * TUI Windows In Python::       Implementing new TUI windows.
+* Disassembly In Python::       Instruction Disassembly In Python
 @end menu
 
 @node Basic Python
@@ -598,6 +599,7 @@
 related prompts are prohibited from being changed.
 @end defun
 
+@anchor{gdb_architecture_names}
 @defun gdb.architecture_names ()
 Return a list containing all of the architecture names that the
 current build of @value{GDBN} supports.  Each architecture name is a
@@ -3278,6 +3280,7 @@
 particular frame (@pxref{Frames In Python}).
 @end defun
 
+@anchor{gdbpy_inferior_read_memory}
 @findex Inferior.read_memory
 @defun Inferior.read_memory (address, length)
 Read @var{length} addressable memory units from the inferior, starting at
@@ -6562,6 +6565,295 @@
 values can be 1 (left), 2 (middle), or 3 (right).
 @end defun
 
+@node Disassembly In Python
+@cindex python instruction disassembly
+@subsubsection Instruction Disassembly In Python
+
+@value{GDBN}'s builtin disassembler can be extended, or even replaced,
+using the Python API.  The disassembler related features are contained
+within the @code{gdb.disassembler} module:
+
+@deftp {class} gdb.disassembler.DisassembleInfo
+Disassembly is driven by instances of this class.  Each time
+@value{GDBN} needs to disassemble an instruction, an instance of this
+class is created and passed to a registered disassembler.  The
+disassembler is then responsible for disassembling an instruction and
+returning a result.
+
+Instances of this type are usually created within @value{GDBN},
+however, it is possible to create a copy of an instance of this type,
+see the description of @code{__init__} for more details.
+
+This class has the following properties and methods:
+
+@defvar DisassembleInfo.address
+A read-only integer containing the address at which @value{GDBN}
+wishes to disassemble a single instruction.
+@end defvar
+
+@defvar DisassembleInfo.architecture
+The @code{gdb.Architecture} (@pxref{Architectures In Python}) for
+which @value{GDBN} is currently disassembling, this property is
+read-only.
+@end defvar
+
+@defvar DisassembleInfo.progspace
+The @code{gdb.Progspace} (@pxref{Progspaces In Python,,Program Spaces
+In Python}) for which @value{GDBN} is currently disassembling, this
+property is read-only.
+@end defvar
+
+@defun DisassembleInfo.read_memory (length, offset)
+This method allows the disassembler to read the bytes of the
+instruction to be disassembled.  The method reads @var{length} bytes,
+starting at @var{offset} from
+@code{DisassembleInfo.address}.
+
+It is important that the disassembler read the instruction bytes using
+this method, rather than reading inferior memory directly, as in some
+cases @value{GDBN} disassembles from an internal buffer rather than
+directly from inferior memory, calling this method handles this
+detail.
+
+Returns a buffer object, which behaves much like an array or a string,
+just as @code{Inferior.read_memory} does
+(@pxref{gdbpy_inferior_read_memory,,Inferior.read_memory}).  The
+length of the returned buffer will always be exactly @var{length}.
+
+If @value{GDBN} is unable to read the required memory then a
+@code{gdb.MemoryError} exception is raised (@pxref{Exception
+Handling}), raising any other exception type from this method is an
+error.
+
+While disassembling a single instruction there could be multiple calls
+to this method, and the same bytes might be read multiple times.  Any
+single call might only read a subset of the total instruction bytes.
+
+Consider, for example, an architecture with 2-byte and 4-byte
+instructions, the disassembler might first read 2-bytes from memory in
+order to establish if the instruction is 2 or 4 bytes long.  If the
+instruction is 4-bytes long the disassembler might then read the
+remaining 2 bytes, or might read the entire 4 bytes again.  The memory
+reading behaviour of the disassembler on different architectures could
+be different.
+@end defun
+
+@defun DisassembleInfo.is_valid ()
+Returns @code{True} if the @code{DisassembleInfo} object is valid,
+@code{False} if not.  A @code{DisassembleInfo} object will become
+invalid once the disassembly call for which the @code{DisassembleInfo}
+was created, has returned.  Calling other @code{DisassembleInfo}
+methods, or accessing @code{DisassembleInfo} properties, will raise a
+@code{RuntimeError} exception if it is invalid.
+@end defun
+
+@defun DisassembleInfo.__init__ (info)
+This can be used to create a new @code{DisassembleInfo} object that is
+a copy of @var{info}.  The copy will have the same @code{address},
+@code{architecture}, and @code{progspace} values as @var{info}, and
+will become invalid at the same time as @var{info}.
+
+This method exists so that sub-classes of @code{DisassembleInfo} can
+be created, these sub-classes must be initialized as copies of an
+existing @code{DisassembleInfo} object, but sub-classes might choose
+to override the @code{read_memory} method, and so control what
+@value{GDBN} sees when reading from memory
+(@pxref{builtin_disassemble}).
+
+@end defun
+
+@end deftp
+
+@deftp {class} Disassembler
+This is a base class from which all user implemented disassemblers
+must inherit.
+
+@defun Disassembler.__init__ (name)
+The constructor takes @var{name}, a string, which should be a short
+name for this disassembler.
+@end defun
+
+@defun Disassembler.__call__ (info)
+The @code{__call__} method must be overridden by sub-classes to
+perform disassembly.  Calling @code{__call__} on this base class will
+raise a @code{NotImplementedError} exception.
+
+The @var{info} argument is an instance of @code{DisassembleInfo}, and
+describes the instruction that @value{GDBN} wants disassembling.
+
+If this function returns @code{None}, this indicates to @value{GDBN}
+that this sub-class doesn't wish to disassemble the requested
+instruction.  @value{GDBN} will then use its builtin disassembler to
+perform the disassembly.
+
+Alternatively, this function can return a @code{DisassemblerResult}
+that represents the disassembled instruction, this type is described
+in more detail below.
+
+The @code{__call__} method can raise a @code{gdb.MemoryError}
+exception (@pxref{Exception Handling}) to indicate to @value{GDBN}
+that there was a problem accessing the required memory, this will then
+be displayed by @value{GDBN} within the disassembler output.
+
+Any other exception type raised by the @code{__call__} method is an
+error, @value{GDBN} will display the error and then use its builtin
+disassembler to disassemble the instruction instead.
+@end defun
+@end deftp
+
+@deftp {class} DisassemblerResult
+This class is used to hold the result of calling
+@w{@code{Disassembler.__call__}}, and represents a single disassembled
+instruction.  This class has the following properties and methods:
+
+@defun DisassemblerResult.__init__ (@var{length}, @var{string})
+Initialize an instance of this class, @var{length} is the length of
+the disassembled instruction in bytes, which must be greater than
+zero, and @var{string} is a non-empty string that represents the
+disassembled instruction.
+@end defun
+
+@defvar DisassemblerResult.length
+A read-only property containing the length of the disassembled
+instruction in bytes, this will always be greater than zero.
+@end defvar
+
+@defvar DisassemblerResult.string
+A read-only property containing a non-empty string representing the
+disassembled instruction.
+@end defvar
+@end deftp
+
+The following functions are also contained in the
+@code{gdb.disassembler} module:
+
+@defun register_disassembler (disassembler, architecture)
+The @var{disassembler} must be a sub-class of
+@code{gdb.disassembler.Disassembler} or @code{None}.
+
+The optional @var{architecture} is either a string, or the value
+@code{None}.  If it is a string, then it should be the name of an
+architecture known to @value{GDBN}, as returned either from
+@code{gdb.Architecture.name}
+(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from
+@code{gdb.architecture_names}
+(@pxref{gdb_architecture_names,,gdb.architecture_names}).
+
+The @var{disassembler} will be installed for the architecture named by
+@var{architecture}, or if @var{architecture} is @code{None}, then
+@var{disassembler} will be installed as a global disassembler for use
+by all architectures.
+
+@cindex disassembler in Python, global vs.@: specific
+@cindex search order for disassembler in Python
+@cindex look up of disassembler in Python
+@value{GDBN} only records a single disassembler for each architecture,
+and a single global disassembler.  Calling
+@code{register_disassembler} for an architecture, or for the global
+disassembler, will replace any existing disassembler registered for
+that @var{architecture} value.  The previous disassembler is returned.
+
+If @var{disassembler} is @code{None} then any disassembler currently
+registered for @var{architecture} is removed, the previously
+registered disassembler is still returned.
+
+When @value{GDBN} is looking for a disassembler to use, @value{GDBN}
+first looks for an architecture specific disassembler.  If none has
+been registered then @value{GDBN} looks for a global disassembler (one
+registered with @var{architecture} set to @code{None}).  Only one
+disassembler is called to perform disassembly, so, if there is both an
+architecture specific disassembler, and a global disassembler
+registered, it is the architecture specific disassembler that will be
+used.
+
+@value{GDBN} tracks the architecture specific, and global
+disassemblers separately, so it doesn't matter in which order
+disassemblers are created or registered; an architecture specific
+disassembler, if present, will always be used in preference to a
+global disassembler.
+
+You can use the @kbd{maint info python-disassemblers} command
+(@pxref{maint info python-disassemblers}) to see which disassemblers
+have been registered.
+@end defun
+
+@anchor{builtin_disassemble}
+@defun builtin_disassemble (info)
+This function calls back into @value{GDBN}'s builtin disassembler to
+disassemble the instruction identified by @var{info}, an instance, or
+sub-class, of @code{DisassembleInfo}
+
+If the builtin disassembler is successful then an instance of
+@code{DisassemblerResult} is returned.
+
+If the builtin disassembler fails then a @code{gdb.MemoryError}
+exception will be raised.
+
+When the builtin disassembler needs to read memory the
+@code{read_memory} method on @var{info} will be called, by
+sub-classing @code{DisassembleInfo} and overriding the
+@code{read_memory} method, it is possible to intercept calls to
+@code{read_memory} by the builtin disassembler, and to modify the
+values returned.
+
+It is important to understand that, even when
+@code{DisassembleInfo.read_memory} raises a @code{gdb.MemoryError}, it
+is the internal disassembler itself that reports the memory error to
+@value{GDBN}.  The reason for this is that the disassembler might
+probe memory to see if a byte is readable or not; if the byte can't be
+read then the disassembler may choose not to report an error, but
+instead to disassemble the bytes that it does have available.
+@end defun
+
+Here is an example that registers a global disassembler.  The new
+disassembler invokes the builtin disassembler, and then adds a
+comment, @code{## Comment}, to each line of disassembly output:
+
+@smallexample
+class ExampleDisassembler(gdb.disassembler.Disassembler):
+    def __init__(self):
+        super().__init__("ExampleDisassembler")
+
+    def __call__(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        length = result.length
+        text = result.string + "\t## Comment"
+        return gdb.disassembler.DisassemblerResult(length, text)
+
+gdb.disassembler.register_disassembler(ExampleDisassembler())
+@end smallexample
+
+The following example creates a sub-class of @code{DisassembleInfo} in
+order to intercept the @code{read_memory} calls, within
+@code{read_memory} any bytes read from memory have the two 4-bit
+nibbles swapped around.  This isn't a very useful adjustment, but
+serves as an example.
+
+@smallexample
+class MyInfo(gdb.disassembler.DisassembleInfo):
+    def __init__(self, info):
+        super().__init__(info)
+
+    def read_memory(self, length, offset):
+        buffer = super().read_memory(length, offset)
+        result = bytearray()
+        for b in buffer:
+            v = int.from_bytes(b, 'little')
+            v = (v << 4) & 0xf0 | (v >> 4)
+            result.append(v)
+        return memoryview(result)
+
+class NibbleSwapDisassembler(gdb.disassembler.Disassembler):
+    def __init__(self):
+        super().__init__("NibbleSwapDisassembler")
+
+    def __call__(self, info):
+        info = MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+gdb.disassembler.register_disassembler(NibbleSwapDisassembler())
+@end smallexample
+
 @node Python Auto-loading
 @subsection Python Auto-loading
 @cindex Python auto-loading
diff --git a/gdb/python/lib/gdb/disassembler.py b/gdb/python/lib/gdb/disassembler.py
new file mode 100644
index 00000000000..5a2d94a5fac
--- /dev/null
+++ b/gdb/python/lib/gdb/disassembler.py
@@ -0,0 +1,178 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+"""Disassembler related module."""
+
+import gdb
+import _gdb.disassembler
+
+# Re-export everything from the _gdb.disassembler module, which is
+# defined within GDB's C++ code.
+from _gdb.disassembler import *
+
+# Module global dictionary of gdb.disassembler.Disassembler objects.
+# The keys of this dictionary are bfd architecture names, or the
+# special value None.
+#
+# When a request to disassemble comes in we first lookup the bfd
+# architecture name from the gdbarch, if that name exists in this
+# dictionary then we use that Disassembler object.
+#
+# If there's no architecture specific disassembler then we look for
+# the key None in this dictionary, and if that key exists, we use that
+# disassembler.
+#
+# If none of the above checks found a suitable disassembler, then no
+# disassembly is performed in Python.
+_disassemblers_dict = {}
+
+
+class Disassembler(object):
+    """A base class from which all user implemented disassemblers must
+    inherit."""
+
+    def __init__(self, name):
+        """Constructor.  Takes a name, which should be a string, which can be
+        used to identify this disassembler in diagnostic messages."""
+        self.name = name
+
+    def __call__(self, info):
+        """A default implementation of __call__.  All sub-classes must
+        override this method.  Calling this default implementation will throw
+        a NotImplementedError exception."""
+        raise NotImplementedError("Disassembler.__call__")
+
+
+def register_disassembler(disassembler, architecture=None):
+    """Register a disassembler.  DISASSEMBLER is a sub-class of
+    gdb.disassembler.Disassembler.  ARCHITECTURE is either None or a
+    string, the name of an architecture known to GDB.
+
+    DISASSEMBLER is registered as a disassembler for ARCHITECTURE, or
+    all architectures when ARCHITECTURE is None.
+
+    Returns the previous disassembler registered with this
+    ARCHITECTURE value.
+    """
+
+    if not isinstance(disassembler, Disassembler) and disassembler is not None:
+        raise TypeError("disassembler should sub-class gdb.disassembler.Disassembler")
+
+    old = None
+    if architecture in _disassemblers_dict:
+        old = _disassemblers_dict[architecture]
+        del _disassemblers_dict[architecture]
+    if disassembler is not None:
+        _disassemblers_dict[architecture] = disassembler
+
+    # Call the private _set_enabled function within the
+    # _gdb.disassembler module.  This function sets a global flag
+    # within GDB's C++ code that enables or dissables the Python
+    # disassembler functionality, this improves performance of the
+    # disassembler by avoiding unneeded calls into Python when we know
+    # that no disassemblers are registered.
+    _gdb.disassembler._set_enabled(len(_disassemblers_dict) > 0)
+    return old
+
+
+def _print_insn(info):
+    """This function is called by GDB when it wants to disassemble an
+    instruction.  INFO describes the instruction to be
+    disassembled."""
+
+    def lookup_disassembler(arch):
+        try:
+            name = arch.name()
+            if name is None:
+                return None
+            if name in _disassemblers_dict:
+                return _disassemblers_dict[name]
+            if None in _disassemblers_dict:
+                return _disassemblers_dict[None]
+            return None
+        except:
+            # It's pretty unlikely this exception case will ever
+            # trigger, one situation would be if the user somehow
+            # corrupted the _disassemblers_dict variable such that it
+            # was no longer a dictionary.
+            return None
+
+    disassembler = lookup_disassembler(info.architecture)
+    if disassembler is None:
+        return None
+    return disassembler(info)
+
+
+class maint_info_py_disassemblers_cmd(gdb.Command):
+    """
+    List all registered Python disassemblers.
+
+    List the name of all registered Python disassemblers, next to the
+    name of the architecture for which the disassembler is registered.
+
+    The global Python disassembler is listed next to the string
+    'GLOBAL'.
+
+    The disassembler that matches the architecture of the currently
+    selected inferior will be marked, this is an indication of which
+    disassembler will be invoked if any disassembly is performed in
+    the current inferior.
+    """
+
+    def __init__(self):
+        super().__init__("maintenance info python-disassemblers", gdb.COMMAND_USER)
+
+    def invoke(self, args, from_tty):
+        # If no disassemblers are registered, tell the user.
+        if len(_disassemblers_dict) == 0:
+            print("No Python disassemblers registered.")
+            return
+
+        # Figure out the longest architecture name, so we can
+        # correctly format the table of results.
+        longest_arch_name = 0
+        for architecture in _disassemblers_dict:
+            if architecture is not None:
+                name = _disassemblers_dict[architecture].name
+                if len(name) > longest_arch_name:
+                    longest_arch_name = len(name)
+
+        # Figure out the name of the current architecture.  There
+        # should always be a current inferior, but if, somehow, there
+        # isn't, then leave curr_arch as the empty string, which will
+        # not then match agaisnt any architecture in the dictionary.
+        curr_arch = ""
+        if gdb.selected_inferior() is not None:
+            curr_arch = gdb.selected_inferior().architecture().name()
+
+        # Now print the dictionary of registered disassemblers out to
+        # the user.
+        match_tag = "\t(Matches current architecture)"
+        fmt_len = max(longest_arch_name, len("Architecture"))
+        format_string = "{:" + str(fmt_len) + "s} {:s}"
+        print(format_string.format("Architecture", "Disassember Name"))
+        for architecture in _disassemblers_dict:
+            if architecture is not None:
+                name = _disassemblers_dict[architecture].name
+                if architecture == curr_arch:
+                    name += match_tag
+                    match_tag = ""
+                print(format_string.format(architecture, name))
+        if None in _disassemblers_dict:
+            name = _disassemblers_dict[None].name + match_tag
+            print(format_string.format("GLOBAL", name))
+
+
+maint_info_py_disassemblers_cmd()
diff --git a/gdb/python/py-disasm.c b/gdb/python/py-disasm.c
new file mode 100644
index 00000000000..c67b2e97664
--- /dev/null
+++ b/gdb/python/py-disasm.c
@@ -0,0 +1,1057 @@
+/* Python interface to instruction disassembly.
+
+   Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "defs.h"
+#include "python-internal.h"
+#include "dis-asm.h"
+#include "arch-utils.h"
+#include "charset.h"
+#include "disasm.h"
+#include "progspace.h"
+
+/* Implement gdb.disassembler.DisassembleInfo type.  An object of this type
+   represents a single disassembler request from GDB.  */
+
+struct disasm_info_object
+{
+  PyObject_HEAD
+
+  /* The architecture in which we are disassembling.  */
+  struct gdbarch *gdbarch;
+
+  /* The program_space in which we are disassembling.  */
+  struct program_space *program_space;
+
+  /* Address of the instruction to disassemble.  */
+  bfd_vma address;
+
+  /* The disassemble_info passed from core GDB, this contains the
+     callbacks necessary to read the instruction from core GDB, and to
+     print the disassembled instruction.  */
+  disassemble_info *gdb_info;
+
+  /* If copies of this object are created then they are chained together
+     via this NEXT pointer, this allows all the copies to be invalidated at
+     the same time as the parent object.  */
+  struct disasm_info_object *next;
+};
+
+extern PyTypeObject disasm_info_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_info_object");
+
+/* Implement gdb.disassembler.DisassemblerResult type, an object that holds
+   the result of calling the disassembler.  This is mostly the length of
+   the disassembled instruction (in bytes), and the string representing the
+   disassembled instruction.  */
+
+struct disasm_result_object
+{
+  PyObject_HEAD
+
+  /* The length of the disassembled instruction in bytes.  */
+  int length;
+
+  /* A buffer which, when allocated, holds the disassembled content of an
+     instruction.  */
+  string_file *content;
+};
+
+extern PyTypeObject disasm_result_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_result_object");
+
+/* When this is false we fast path out of gdbpy_print_insn, which should
+   keep the performance impact of the Python disassembler down.  This is
+   set to true from Python by calling gdb.disassembler._set_enabled() when
+   the user registers a disassembler.  */
+
+static bool python_print_insn_enabled = false;
+
+/* A sub-class of gdb_disassembler that holds a pointer to a Python
+   DisassembleInfo object.  A pointer to an instance of this class is
+   placed in the application_data field of the disassemble_info that is
+   used when we call gdbarch_print_insn.  */
+
+struct gdbpy_disassembler : public gdb_printing_disassembler
+{
+  /* Constructor.  */
+  gdbpy_disassembler (disasm_info_object *obj, PyObject *memory_source);
+
+  /* Get the DisassembleInfo object pointer.  */
+  disasm_info_object *
+  py_disasm_info () const
+  {
+    return m_disasm_info_object;
+  }
+
+  /* Callbacks used by disassemble_info.  */
+  static void memory_error_func (int status, bfd_vma memaddr,
+				 struct disassemble_info *info);
+  static void print_address_func (bfd_vma addr,
+				  struct disassemble_info *info);
+  static int read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+			       unsigned int len,
+			       struct disassemble_info *info);
+
+  /* Return a reference to an optional that contains the address at which a
+     memory error occurred.  The optional will only have a value if a
+     memory error actually occurred.  */
+  const gdb::optional<CORE_ADDR> &memory_error_address () const
+  { return m_memory_error_address; }
+
+  /* Return the content of the disassembler as a string.  The contents are
+     moved out of the disassembler, so after this call the disassembler
+     contents have been reset back to empty.  */
+  std::string release ()
+  {
+    return m_string_file.release ();
+  }
+
+private:
+
+  /* Where the disassembler result is written.  */
+  string_file m_string_file;
+
+  /* The DisassembleInfo object we are disassembling for.  */
+  disasm_info_object *m_disasm_info_object;
+
+  /* When the user indicates that a memory error has occurred then the
+     address of the memory error is stored in here.  */
+  gdb::optional<CORE_ADDR> m_memory_error_address;
+
+  /* When the user calls the builtin_disassemble function, if they pass a
+     memory source object then a pointer to the object is placed in here,
+     otherwise, this field is nullptr.  */
+  PyObject *m_memory_source;
+};
+
+/* Return true if OBJ is still valid, otherwise, return false.  A valid OBJ
+   will have a non-nullptr gdb_info field.  */
+
+static bool
+disasm_info_object_is_valid (disasm_info_object *obj)
+{
+  return obj->gdb_info != nullptr;
+}
+
+/* Fill in OBJ with all the other arguments.  */
+
+static void
+disasm_info_fill (disasm_info_object *obj, struct gdbarch *gdbarch,
+		  program_space *progspace, bfd_vma address,
+		  disassemble_info *di, disasm_info_object *next)
+{
+  obj->gdbarch = gdbarch;
+  obj->program_space = progspace;
+  obj->address = address;
+  obj->gdb_info = di;
+  obj->next = next;
+}
+
+/* Implement DisassembleInfo.__init__.  Takes a single argument that must
+   be another DisassembleInfo object and copies the contents from the
+   argument into this new object.  */
+
+static int
+disasm_info_init (PyObject *self, PyObject *args, PyObject *kwargs)
+{
+  static const char *keywords[] = { "info", NULL };
+  PyObject *info_obj;
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kwargs, "O!", keywords,
+					&disasm_info_object_type,
+					&info_obj))
+    return -1;
+
+  disasm_info_object *other = (disasm_info_object *) info_obj;
+  disasm_info_object *info = (disasm_info_object *) self;
+  disasm_info_fill (info, other->gdbarch, other->program_space,
+		    other->address, other->gdb_info, other->next);
+  other->next = info;
+
+  /* As the OTHER object now holds a pointer to INFO we inc the ref count
+     on INFO.  This stops INFO being deleted until OTHER has gone away.  */
+  Py_INCREF ((PyObject *) info);
+  return 0;
+}
+
+/* The tp_dealloc callback for the DisassembleInfo type.  */
+
+static void
+disasm_info_dealloc (PyObject *self)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+
+  /* We no longer care about the object our NEXT pointer points at, so we
+     can decrement its reference count.  This macro handles the case when
+     NEXT is nullptr.  */
+  Py_XDECREF ((PyObject *) obj->next);
+
+  /* Now core deallocation behaviour.  */
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* Implement DisassembleInfo.is_valid(), really just a wrapper around the
+   disasm_info_object_is_valid function above.  */
+
+static PyObject *
+disasmpy_info_is_valid (PyObject *self, PyObject *args)
+{
+  disasm_info_object *disasm_obj = (disasm_info_object *) self;
+
+  if (disasm_info_object_is_valid (disasm_obj))
+    Py_RETURN_TRUE;
+
+  Py_RETURN_FALSE;
+}
+
+/* Set the Python exception to be a gdb.MemoryError object, with ADDRESS
+   as its payload.  */
+
+static void
+disasmpy_set_memory_error_for_address (CORE_ADDR address)
+{
+  PyObject *address_obj = gdb_py_object_from_longest (address).release ();
+  PyErr_SetObject (gdbpy_gdb_memory_error, address_obj);
+}
+
+/* Ensure that a gdb.disassembler.DisassembleInfo is valid.  */
+
+#define DISASMPY_DISASM_INFO_REQUIRE_VALID(Info)			\
+  do {									\
+    if (!disasm_info_object_is_valid (Info))				\
+      {									\
+	PyErr_SetString (PyExc_RuntimeError,				\
+			 _("DisassembleInfo is no longer valid."));	\
+	return nullptr;							\
+      }									\
+  } while (0)
+
+/* Initialise OBJ, a DisassemblerResult object with LENGTH and CONTENT.
+   OBJ might already have been initialised, in which case any existing
+   content should be discarded before the new CONTENT is moved in.  */
+
+static void
+disasmpy_init_disassembler_result (disasm_result_object *obj, int length,
+				   std::string content)
+{
+  if (obj->content == nullptr)
+    obj->content = new string_file;
+  else
+    obj->content->clear ();
+
+  obj->length = length;
+  *(obj->content) = std::move (content);
+}
+
+/* Implement gdb.disassembler.builtin_disassemble().  Calls back into GDB's
+   builtin disassembler.  The first argument is a DisassembleInfo object
+   describing what to disassemble.  The second argument is optional and
+   provides a mechanism to modify the memory contents that the builtin
+   disassembler will actually disassemble.
+
+   Returns an instance of gdb.disassembler.DisassemblerResult, an object
+   that wraps a disassembled instruction, or it raises a
+   gdb.MemoryError.  */
+
+static PyObject *
+disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *info_obj, *memory_source_obj = nullptr;
+  static const char *keywords[] = { "info", "memory_source", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
+					&disasm_info_object_type, &info_obj,
+					&memory_source_obj))
+    return nullptr;
+
+  disasm_info_object *disasm_info = (disasm_info_object *) info_obj;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (disasm_info);
+
+  /* Where the result will be written.  */
+  gdbpy_disassembler disassembler (disasm_info, memory_source_obj);
+
+  /* Now actually perform the disassembly.  */
+  int length
+    = gdbarch_print_insn (disasm_info->gdbarch, disasm_info->address,
+			  disassembler.disasm_info ());
+
+  if (length == -1)
+    {
+
+      /* In an ideal world, every disassembler should always call the
+	 memory error function before returning a status of -1 as the only
+	 error a disassembler should encounter is a failure to read
+	 memory.  Unfortunately, there are some disassemblers who don't
+	 follow this rule, and will return -1 without calling the memory
+	 error function.
+
+	 To make the Python API simpler, we just classify everything as a
+	 memory error, but the message has to be modified for the case
+	 where the disassembler didn't call the memory error function.  */
+      if (disassembler.memory_error_address ().has_value ())
+	{
+	  CORE_ADDR addr = *disassembler.memory_error_address ();
+	  disasmpy_set_memory_error_for_address (addr);
+	}
+      else
+	PyErr_SetString (gdbpy_gdb_memory_error, "Unknown disassembly error");
+      return nullptr;
+    }
+
+  /* Instructions are either non-zero in length, or we got an error,
+     indicated by a length of -1, which we handled above.  */
+  gdb_assert (length > 0);
+
+  /* We should not have seen a memory error in this case.  */
+  gdb_assert (!disassembler.memory_error_address ().has_value ());
+
+  /* Create a DisassemblerResult containing the results.  */
+  std::string content = disassembler.release ();
+  PyTypeObject *type = &disasm_result_object_type;
+  gdbpy_ref<disasm_result_object> res
+    ((disasm_result_object *) type->tp_alloc (type, 0));
+  disasmpy_init_disassembler_result (res.get (), length, std::move (content));
+  return reinterpret_cast<PyObject *> (res.release ());
+}
+
+/* Implement gdb._set_enabled function.  Takes a boolean parameter, and
+   sets whether GDB should enter the Python disassembler code or not.
+
+   This is called from within the Python code when a new disassembler is
+   registered.  When no disassemblers are registered the global C++ flag
+   is set to false, and GDB never even enters the Python environment to
+   check for a disassembler.
+
+   When the user registers a new Python disassembler, the global C++ flag
+   is set to true, and now GDB will enter the Python environment to check
+   if there's a disassembler registered for the current architecture.  */
+
+static PyObject *
+disasmpy_set_enabled (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *newstate;
+  static const char *keywords[] = { "state", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O", keywords,
+					&newstate))
+    return nullptr;
+
+  if (!PyBool_Check (newstate))
+    {
+      PyErr_SetString (PyExc_TypeError,
+		       _("The value passed to `_set_enabled' must be a boolean."));
+      return nullptr;
+    }
+
+  python_print_insn_enabled = PyObject_IsTrue (newstate);
+  Py_RETURN_NONE;
+}
+
+/* Implement DisassembleInfo.read_memory(LENGTH, OFFSET).  Read LENGTH
+   bytes at OFFSET from the start of the instruction currently being
+   disassembled, and return a memory buffer containing the bytes.
+
+   OFFSET defaults to zero if it is not provided.  LENGTH is required.  If
+   the read fails then this will raise a gdb.MemoryError exception.  */
+
+static PyObject *
+disasmpy_info_read_memory (PyObject *self, PyObject *args, PyObject *kw)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+
+  LONGEST length, offset = 0;
+  gdb::unique_xmalloc_ptr<gdb_byte> buffer;
+  static const char *keywords[] = { "length", "offset", nullptr };
+
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "L|L", keywords,
+					&length, &offset))
+    return nullptr;
+
+  /* The apparent address from which we are reading memory.  Note that in
+     some cases GDB actually disassembles instructions from a buffer, so
+     we might not actually be reading this information directly from the
+     inferior memory.  This is all hidden behind the read_memory_func API
+     within the disassemble_info structure.  */
+  CORE_ADDR address = obj->address + offset;
+
+  /* Setup a buffer to hold the result.  */
+  buffer.reset ((gdb_byte *) xmalloc (length));
+
+  /* Read content into BUFFER.  If the read fails then raise a memory
+     error, otherwise, convert BUFFER to a Python memory buffer, and return
+     it to the user.  */
+  disassemble_info *info = obj->gdb_info;
+  if (info->read_memory_func ((bfd_vma) address, buffer.get (),
+			      (unsigned int) length, info) != 0)
+    {
+      disasmpy_set_memory_error_for_address (address);
+      return nullptr;
+    }
+  return gdbpy_buffer_to_membuf (std::move (buffer), address, length);
+}
+
+/* Implement DisassembleInfo.address attribute, return the address at which
+   GDB would like an instruction disassembled.  */
+
+static PyObject *
+disasmpy_info_address (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdb_py_object_from_longest (obj->address).release ();
+}
+
+/* Implement DisassembleInfo.architecture attribute.  Return the
+   gdb.Architecture in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_architecture (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdbarch_to_arch_object (obj->gdbarch);
+}
+
+/* Implement DisassembleInfo.progspace attribute.  Return the
+   gdb.Progspace in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_progspace (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return pspace_to_pspace_object (obj->program_space).release ();
+}
+
+/* This implements the disassemble_info read_memory_func callback and is
+   called from the libopcodes disassembler when the disassembler wants to
+   read memory.
+
+   From the INFO argument we can find the gdbpy_disassembler object for
+   which we are disassembling, and from that object we can find the
+   DisassembleInfo for the current disassembly call.
+
+   This function reads the instruction bytes by calling the read_memory
+   method on the DisassembleInfo object.  This method might have been
+   overridden by user code.
+
+   Read LEN bytes from MEMADDR and place them into BUFF.  Return 0 on
+   success (in which case BUFF has been filled), or -1 on error, in which
+   case the contents of BUFF are undefined.  */
+
+int
+gdbpy_disassembler::read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+				      unsigned int len,
+				      struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  disasm_info_object *obj = dis->py_disasm_info ();
+
+  /* The DisassembleInfo.read_memory method expects an offset from the
+     address stored within the DisassembleInfo object; calculate that
+     offset here.  */
+  LONGEST offset = (LONGEST) memaddr - (LONGEST) obj->address;
+
+  /* Now call the DisassembleInfo.read_memory method.  This might have been
+     overridden by the user.  */
+  gdbpy_ref<> result_obj (PyObject_CallMethod ((PyObject *) obj,
+					       "read_memory",
+					       "KL", len, offset));
+
+  /* Handle any exceptions.  */
+  if (result_obj == nullptr)
+    {
+      /* If we got a gdb.MemoryError then we ignore this and just report
+	 that the read failed to the caller.  The caller is then
+	 responsible for calling the memory_error_func if it wants to.
+	 Remember, the disassembler might just be probing to see if these
+	 bytes can be read, if we automatically call the memory error
+	 function, we can end up registering an error prematurely.  */
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	PyErr_Clear ();
+      else
+	gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Convert the result to a buffer.  */
+  Py_buffer py_buff;
+  if (!PyObject_CheckBuffer (result_obj.get ())
+      || PyObject_GetBuffer (result_obj.get(), &py_buff, PyBUF_CONTIG_RO) < 0)
+    {
+      PyErr_Format (PyExc_TypeError,
+		    _("Result from read_memory is not a buffer"));
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Wrap PY_BUFF so that it is cleaned up correctly at the end of this
+     scope.  */
+  Py_buffer_up buffer_up (&py_buff);
+
+  /* Validate that the buffer is the correct length.  */
+  if (py_buff.len != len)
+    {
+      PyErr_Format (PyExc_ValueError,
+		    _("Buffer returned from read_memory is sized %d instead of the expected %d"),
+		    py_buff.len, len);
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Copy the data out of the Python buffer and return success.  */
+  const gdb_byte *buffer = (const gdb_byte *) py_buff.buf;
+  memcpy (buff, buffer, len);
+  return 0;
+}
+
+/* Implement DisassemblerResult.length attribute, return the length of the
+   disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_length (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  return gdb_py_object_from_longest (obj->length).release ();
+}
+
+/* Implement DisassemblerResult.string attribute, return the content string
+   of the disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_string (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+
+  gdb_assert (obj->content != nullptr);
+  gdb_assert (strlen (obj->content->c_str ()) > 0);
+  gdb_assert (obj->length > 0);
+  return PyUnicode_Decode (obj->content->c_str (),
+			   obj->content->size (),
+			   host_charset (), nullptr);
+}
+
+/* Implement DisassemblerResult.__init__.  Takes two arguments, an
+   integer, the length in bytes of the disassembled instruction, and a
+   string, the disassembled content of the instruction.  */
+
+static int
+disasmpy_result_init (PyObject *self, PyObject *args, PyObject *kwargs)
+{
+  static const char *keywords[] = { "length", "string", NULL };
+  int length;
+  const char *string;
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kwargs, "is", keywords,
+					&length, &string))
+    return -1;
+
+  if (length <= 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("Length must be greater than 0."));
+      return -1;
+    }
+
+  if (strlen (string) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("String must not be empty."));
+      return -1;
+    }
+
+  disasm_result_object *obj = (disasm_result_object *) self;
+  disasmpy_init_disassembler_result (obj, length, std::string (string));
+
+  return 0;
+}
+
+/* Implement memory_error_func callback for disassemble_info.  Extract the
+   underlying DisassembleInfo Python object, and set a memory error on
+   it.  */
+
+void
+gdbpy_disassembler::memory_error_func (int status, bfd_vma memaddr,
+				       struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  dis->m_memory_error_address.emplace (memaddr);
+}
+
+/* Wrapper of print_address.  */
+
+void
+gdbpy_disassembler::print_address_func (bfd_vma addr,
+					struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  print_address (dis->arch (), addr, (struct ui_file *) info->stream);
+}
+
+/* constructor.  */
+
+gdbpy_disassembler::gdbpy_disassembler (disasm_info_object *obj,
+					PyObject *memory_source)
+  : gdb_printing_disassembler (obj->gdbarch, &m_string_file,
+			       read_memory_func, memory_error_func,
+			       print_address_func),
+    m_disasm_info_object (obj),
+    m_memory_source (memory_source)
+{ /* Nothing.  */ }
+
+/* A wrapper around a reference to a Python DisassembleInfo object, which
+   ensures that the object is marked as invalid when we leave the enclosing
+   scope.
+
+   Each DisassembleInfo is created in gdbpy_print_insn, and is done with by
+   the time that function returns.  However, there's nothing to stop a user
+   caching a reference to the DisassembleInfo, and thus keeping the object
+   around.
+
+   We therefore have the notion of a DisassembleInfo becoming invalid, this
+   happens when gdbpy_print_insn returns.  This class is responsible for
+   marking the DisassembleInfo as invalid in its destructor.  */
+
+struct scoped_disasm_info_object
+{
+  /* Constructor.  */
+  scoped_disasm_info_object (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+			     disassemble_info *info)
+    : m_disasm_info (allocate_disasm_info_object ())
+  {
+    disasm_info_fill (m_disasm_info.get (), gdbarch, current_program_space,
+		      memaddr, info, nullptr);
+  }
+
+  /* Upon destruction mark m_diasm_info as invalid.  */
+  ~scoped_disasm_info_object ()
+  {
+    /* Invalidate the original DisassembleInfo object as well as any copies
+       that the user might have made.  */
+    for (disasm_info_object *obj = m_disasm_info.get ();
+	 obj != nullptr;
+	 obj = obj->next)
+      obj->gdb_info = nullptr;
+  }
+
+  /* Return a pointer to the underlying disasm_info_object instance.  */
+  disasm_info_object *
+  get () const
+  {
+    return m_disasm_info.get ();
+  }
+
+private:
+
+  /* Wrapper around the call to PyObject_New, this wrapper function can be
+     called from the constructor initialization list, while PyObject_New, a
+     macro, can't.  */
+  static disasm_info_object *
+  allocate_disasm_info_object ()
+  {
+    return (disasm_info_object *) PyObject_New (disasm_info_object,
+						&disasm_info_object_type);
+  }
+
+  /* A reference to a gdb.disassembler.DisassembleInfo object.  When this
+     containing instance goes out of scope this reference is released,
+     however, the user might be holding other references to the
+     DisassembleInfo object in Python code, so the underlying object might
+     not be deleted.  */
+  gdbpy_ref<disasm_info_object> m_disasm_info;
+};
+
+/* See python-internal.h.  */
+
+gdb::optional<int>
+gdbpy_print_insn (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+		  disassemble_info *info)
+{
+  /* Early exit case.  This must be done as early as possible, and
+     definitely before we enter Python environment.  The
+     python_print_insn_enabled flag is set (from Python) only when the user
+     has installed one (or more) Python disassemblers.  So in the common
+     case (no custom disassembler installed) this flag will be false,
+     allowing for a quick return.  */
+  if (!gdb_python_initialized || !python_print_insn_enabled)
+    return {};
+
+  gdbpy_enter enter_py (get_current_arch (), current_language);
+
+  /* Import the gdb.disassembler module.  */
+  gdbpy_ref<> gdb_python_disassembler_module
+    (PyImport_ImportModule ("gdb.disassembler"));
+  if (gdb_python_disassembler_module == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Get the _print_insn attribute from the module, this should be the
+     function we are going to call to actually perform the disassembly.  */
+  gdbpy_ref<> hook
+    (PyObject_GetAttrString (gdb_python_disassembler_module.get (),
+			     "_print_insn"));
+  if (hook == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Create the new DisassembleInfo object we will pass into Python.  This
+     object will be marked as invalid when we leave this scope.  */
+  scoped_disasm_info_object scoped_disasm_info (gdbarch, memaddr, info);
+  disasm_info_object *disasm_info = scoped_disasm_info.get ();
+
+  /* Call into the registered disassembler to (possibly) perform the
+     disassembly.  */
+  PyObject *insn_disas_obj = (PyObject *) disasm_info;
+  gdbpy_ref<> result (PyObject_CallFunctionObjArgs (hook.get (),
+						    insn_disas_obj,
+						    nullptr));
+
+  if (result == nullptr)
+    {
+      /* The call into Python code resulted in an exception.  If this was a
+	 gdb.MemoryError, then we can figure out an address and call the
+	 disassemble_info::memory_error_func to report the error back to
+	 core GDB.  Any other exception type we assume means a bug in the
+	 user's code, and print stack.  */
+
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	{
+	  /* A gdb.MemoryError might have an address attribute which
+	     contains the address at which the memory error occurred.  If
+	     this is the case then use this address, otherwise, fallback to
+	     just using the address of the instruction we were asked to
+	     disassemble.  */
+	  PyObject *error_type, *error_value, *error_traceback;
+	  CORE_ADDR addr;
+
+	  PyErr_Fetch (&error_type, &error_value, &error_traceback);
+
+	  if (error_value != nullptr
+	      && PyObject_HasAttrString (error_value, "address"))
+	    {
+	      PyObject *addr_obj = PyObject_GetAttrString (error_value,
+							   "address");
+	      if (get_addr_from_python (addr_obj, &addr) < 0)
+		addr = disasm_info->address;
+	    }
+	  else
+	    addr = disasm_info->address;
+
+	  PyErr_Clear ();
+	  info->memory_error_func (-1, addr, info);
+	  return gdb::optional<int> (-1);
+	}
+      else
+	{
+	  /* Anything that is not gdb.MemoryError.  */
+	  gdbpy_print_stack ();
+	  return {};
+	}
+    }
+  else if (result == Py_None)
+    {
+      /* A return value of None indicates that the Python code could not,
+	 or doesn't want to, disassemble this instruction.  Just return an
+	 empty result and core GDB will try to disassemble this for us.  */
+      return {};
+    }
+
+  /* Check the result is a DisassemblerResult (or a sub-class).  */
+  if (!PyObject_IsInstance (result.get (),
+			    (PyObject *) &disasm_result_object_type))
+    {
+      PyErr_SetString (PyExc_TypeError,
+		       _("Result is not a DisassemblerResult."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* The call into Python neither raised an exception, or returned None.
+     Check to see if the result looks valid.  */
+  gdbpy_ref<> length_obj (PyObject_GetAttrString (result.get (), "length"));
+  if (length_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  gdbpy_ref<> string_obj (PyObject_GetAttrString (result.get (), "string"));
+  if (string_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+  if (!gdbpy_is_string (string_obj.get ()))
+    {
+      PyErr_SetString (PyExc_TypeError, _("String attribute is not a string."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  gdb::unique_xmalloc_ptr<char> string
+    = gdbpy_obj_to_string (string_obj.get ());
+  if (string == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  long length;
+  if (!gdb_py_int_as_long (length_obj.get (), &length))
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  long max_insn_length = (gdbarch_max_insn_length_p (gdbarch) ?
+			  gdbarch_max_insn_length (gdbarch) : INT_MAX);
+  if (length <= 0)
+    {
+      PyErr_SetString
+	(PyExc_ValueError,
+	 _("Invalid length attribute: length must be greater than 0."));
+      gdbpy_print_stack ();
+      return {};
+    }
+  if (length > max_insn_length)
+    {
+      PyErr_Format
+	(PyExc_ValueError,
+	 _("Invalid length attribute: length %d greater than architecture maximum of %d"),
+	 length, max_insn_length);
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  if (strlen (string.get ()) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError, _("String attribute must not be empty."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Print the disassembled instruction back to core GDB, and return the
+     length of the disassembled instruction.  */
+  info->fprintf_func (info->stream, "%s", string.get ());
+  return gdb::optional<int> (length);
+}
+
+/* The tp_dealloc callback for the DisassemblerResult type.  Takes care of
+   deallocating the content buffer.  */
+
+static void
+disasmpy_dealloc_result (PyObject *self)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  delete obj->content;
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* The get/set attributes of the gdb.disassembler.DisassembleInfo type.  */
+
+static gdb_PyGetSetDef disasm_info_object_getset[] = {
+  { "address", disasmpy_info_address, nullptr,
+    "Start address of the instruction to disassemble.", nullptr },
+  { "architecture", disasmpy_info_architecture, nullptr,
+    "Architecture to disassemble in", nullptr },
+  { "progspace", disasmpy_info_progspace, nullptr,
+    "Program space to disassemble in", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* The methods of the gdb.disassembler.DisassembleInfo type.  */
+
+static PyMethodDef disasm_info_object_methods[] = {
+  { "read_memory", (PyCFunction) disasmpy_info_read_memory,
+    METH_VARARGS | METH_KEYWORDS,
+    "read_memory (LEN, OFFSET = 0) -> Octets[]\n\
+Read LEN octets for the instruction to disassemble." },
+  { "is_valid", disasmpy_info_is_valid, METH_NOARGS,
+    "is_valid () -> Boolean.\n\
+Return true if this DisassembleInfo is valid, false if not." },
+  {nullptr}  /* Sentinel */
+};
+
+/* The get/set attributes of the gdb.disassembler.DisassemblerResult type.  */
+
+static gdb_PyGetSetDef disasm_result_object_getset[] = {
+  { "length", disasmpy_result_length, nullptr,
+    "Length of the disassembled instruction.", nullptr },
+  { "string", disasmpy_result_string, nullptr,
+    "String representing the disassembled instruction.", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* These are the methods we add into the _gdb.disassembler module, which
+   are then imported into the gdb.disassembler module.  These are global
+   functions that support performing disassembly.  */
+
+PyMethodDef python_disassembler_methods[] =
+{
+  { "builtin_disassemble", (PyCFunction) disasmpy_builtin_disassemble,
+    METH_VARARGS | METH_KEYWORDS,
+    "builtin_disassemble (INFO, MEMORY_SOURCE = None) -> None\n\
+Disassemble using GDB's builtin disassembler.  INFO is an instance of\n\
+gdb.disassembler.DisassembleInfo.  The MEMORY_SOURCE, if not None, should\n\
+be an object with the read_memory method." },
+  { "_set_enabled", (PyCFunction) disasmpy_set_enabled,
+    METH_VARARGS | METH_KEYWORDS,
+    "_set_enabled (STATE) -> None\n\
+Set whether GDB should call into the Python _print_insn code or not." },
+  {nullptr, nullptr, 0, nullptr}
+};
+
+/* Structure to define the _gdb.disassembler module.  */
+
+static struct PyModuleDef python_disassembler_module_def =
+{
+  PyModuleDef_HEAD_INIT,
+  "_gdb.disassembler",
+  nullptr,
+  -1,
+  python_disassembler_methods,
+  nullptr,
+  nullptr,
+  nullptr,
+  nullptr
+};
+
+/* Called to initialize the Python structures in this file.  */
+
+int
+gdbpy_initialize_disasm ()
+{
+  /* Create the _gdb.disassembler module, and add it to the _gdb module.  */
+
+  PyObject *gdb_disassembler_module;
+  gdb_disassembler_module = PyModule_Create (&python_disassembler_module_def);
+  if (gdb_disassembler_module == nullptr)
+    return -1;
+  PyModule_AddObject(gdb_module, "disassembler", gdb_disassembler_module);
+
+  /* This is needed so that 'import _gdb.disassembler' will work.  */
+  PyObject *dict = PyImport_GetModuleDict ();
+  PyDict_SetItemString (dict, "_gdb.disassembler", gdb_disassembler_module);
+
+  disasm_info_object_type.tp_new = PyType_GenericNew;
+  if (PyType_Ready (&disasm_info_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassembleInfo",
+			      (PyObject *) &disasm_info_object_type) < 0)
+    return -1;
+
+  disasm_result_object_type.tp_new = PyType_GenericNew;
+  if (PyType_Ready (&disasm_result_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassemblerResult",
+			      (PyObject *) &disasm_result_object_type) < 0)
+    return -1;
+
+  return 0;
+}
+
+/* Describe the gdb.disassembler.DisassembleInfo type.  */
+
+PyTypeObject disasm_info_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassembleInfo",		/*tp_name*/
+  sizeof (disasm_info_object),			/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  disasm_info_dealloc,				/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,	/*tp_flags*/
+  "GDB instruction disassembler object",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  disasm_info_object_methods,			/* tp_methods */
+  0,						/* tp_members */
+  disasm_info_object_getset,			/* tp_getset */
+  0,						/* tp_base */
+  0,						/* tp_dict */
+  0,						/* tp_descr_get */
+  0,						/* tp_descr_set */
+  0,						/* tp_dictoffset */
+  disasm_info_init,				/* tp_init */
+  0,						/* tp_alloc */
+};
+
+/* Describe the gdb.disassembler.DisassemblerResult type.  */
+
+PyTypeObject disasm_result_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassemblerResult",	/*tp_name*/
+  sizeof (disasm_result_object),		/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  disasmpy_dealloc_result,			/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,	/*tp_flags*/
+  "GDB object, representing a disassembler result",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  0,						/* tp_methods */
+  0,						/* tp_members */
+  disasm_result_object_getset,			/* tp_getset */
+  0,						/* tp_base */
+  0,						/* tp_dict */
+  0,						/* tp_descr_get */
+  0,						/* tp_descr_set */
+  0,						/* tp_dictoffset */
+  disasmpy_result_init,				/* tp_init */
+  0,						/* tp_alloc */
+};
diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
index d947b96033b..ed5894c1c3d 100644
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@@ -540,6 +540,8 @@ int gdbpy_initialize_connection ()
 int gdbpy_initialize_micommands (void)
   CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 void gdbpy_finalize_micommands ();
+int gdbpy_initialize_disasm ()
+  CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 
 /* A wrapper for PyErr_Fetch that handles reference counting for the
    caller.  */
@@ -822,4 +824,18 @@ extern bool gdbpy_is_architecture (PyObject *obj);
 
 extern bool gdbpy_is_progspace (PyObject *obj);
 
+/* Implement the 'print_insn' hook for Python.  Disassemble an instruction
+   whose address is ADDRESS for architecture GDBARCH.  The bytes of the
+   instruction should be read with INFO->read_memory_func as the
+   instruction being disassembled might actually be in a buffer.
+
+   Used INFO->fprintf_func to print the results of the disassembly, and
+   return the length of the instruction in octets.
+
+   If no instruction can be disassembled then return an empty value.  */
+
+extern gdb::optional<int> gdbpy_print_insn (struct gdbarch *gdbarch,
+					    CORE_ADDR address,
+					    disassemble_info *info);
+
 #endif /* PYTHON_PYTHON_INTERNAL_H */
diff --git a/gdb/python/python.c b/gdb/python/python.c
index b5b8379e23c..084b3687fec 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -167,7 +167,7 @@ static const struct extension_language_ops python_extension_ops =
 
   gdbpy_colorize_disasm,
 
-  NULL, /* gdbpy_print_insn, */
+  gdbpy_print_insn,
 };
 
 #endif /* HAVE_PYTHON */
@@ -2045,6 +2045,7 @@ do_start_initialization ()
 
   if (gdbpy_initialize_auto_load () < 0
       || gdbpy_initialize_values () < 0
+      || gdbpy_initialize_disasm () < 0
       || gdbpy_initialize_frames () < 0
       || gdbpy_initialize_commands () < 0
       || gdbpy_initialize_instruction () < 0
diff --git a/gdb/testsuite/gdb.python/py-disasm.c b/gdb/testsuite/gdb.python/py-disasm.c
new file mode 100644
index 00000000000..ee0bb157f4d
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.c
@@ -0,0 +1,25 @@
+/* This test program is part of GDB, the GNU debugger.
+
+   Copyright 2021-2022 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+int
+main ()
+{
+  asm ("nop");
+  asm ("nop");	/* Break here.  */
+  asm ("nop");
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.exp b/gdb/testsuite/gdb.python/py-disasm.exp
new file mode 100644
index 00000000000..432a1c61d02
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.exp
@@ -0,0 +1,202 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This file is part of the GDB testsuite.  It validates the Python
+# disassembler API.
+
+load_lib gdb-python.exp
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile} "debug"] } {
+    return -1
+}
+
+# Skip all tests if Python scripting is not enabled.
+if { [skip_python_tests] } { continue }
+
+if ![runto_main] then {
+    fail "can't run to main"
+    return 0
+}
+
+set pyfile [gdb_remote_download host ${srcdir}/${subdir}/${testfile}.py]
+
+gdb_test "source ${pyfile}" "Python script imported" \
+         "import python scripts"
+
+gdb_breakpoint [gdb_get_line_number "Break here."]
+gdb_continue_to_breakpoint "Break here."
+
+set curr_pc [get_valueof "/x" "\$pc" "*unknown*"]
+
+gdb_test_no_output "python current_pc = ${curr_pc}"
+
+# The current pc will be something like 0x1234 with no leading zeros.
+# However, in the disassembler output addresses are padded with zeros.
+# This substitution changes 0x1234 to 0x0*1234, which can then be used
+# as a regexp in the disassembler output matching.
+set curr_pc_pattern [string replace ${curr_pc} 0 1 "0x0*"]
+
+# Grab the name of the current architecture, this is used in the tests
+# patterns below.
+set curr_arch [get_python_valueof "gdb.selected_inferior().architecture().name()" "*unknown*"]
+
+# Helper proc that removes all registered disassemblers.
+proc py_remove_all_disassemblers {} {
+    gdb_test_no_output "python remove_all_python_disassemblers()"
+}
+
+# A list of test plans.  Each plan is a list of two elements, the
+# first element is the name of a class in py-disasm.py, this is a
+# disassembler class.  The second element is a pattern that should be
+# matched in the disassembler output.
+#
+# Each different disassembler tests some different feature of the
+# Python disassembler API.
+set addr_pattern "\r\n=> ${curr_pc_pattern} <\[^>\]+>:\\s+"
+set base_pattern "${addr_pattern}nop"
+set test_plans \
+    [list \
+	 [list "" "${base_pattern}\r\n.*"] \
+	 [list "GlobalNullDisassembler" "${base_pattern}\r\n.*"] \
+	 [list "GlobalPreInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalPostInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalReadDisassembler" "${base_pattern}\\s+## bytes =( $hex)+\r\n.*"] \
+	 [list "GlobalAddrDisassembler" "${base_pattern}\\s+## addr = ${curr_pc_pattern} <\[^>\]+>\r\n.*"] \
+	 [list "NonMemoryErrorEarlyDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: non-memory error instead of a result\r\nnop\r\n.*"] \
+	 [list "NonMemoryErrorLateDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: non-memory error after builtin disassembler\r\nnop\r\n.*"] \
+	 [list "MemoryErrorEarlyDisassembler" "${base_pattern}\\s+## AFTER ERROR\r\n.*"] \
+	 [list "MemoryErrorLateDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "RethrowMemoryErrorDisassembler" "${addr_pattern}Cannot access memory at address $hex"] \
+	 [list "FaultingMemorySourceDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "FailingMemorySourceDisassembler" "${addr_pattern}Python Exception <class 'gdb.GdbError'>: the memory source failed\r\n\r\nCannot access memory at address ${curr_pc_pattern}"] \
+	 [list "MemorySourceNotABufferDisassembler" "${addr_pattern}Python Exception <class 'TypeError'>: Result from read_memory is not a buffer\r\n\r\nCannot access memory at address ${curr_pc_pattern}"] \
+	 [list "MemorySourceBufferTooLongDisassembler" "${addr_pattern}Python Exception <class 'ValueError'>: Buffer returned from read_memory is sized $decimal instead of the expected $decimal\r\n\r\nCannot access memory at address ${curr_pc_pattern}"] \
+	 [list "ResultOfWrongType" "${addr_pattern}Python Exception <class 'TypeError'>: Result is not a DisassemblerResult.\r\n.*"] \
+	 [list "ResultWithInvalidLength" "${addr_pattern}Python Exception <class 'ValueError'>: Invalid length attribute: length must be greater than 0.\r\n.*"] \
+	 [list "ResultWithInvalidString" "${addr_pattern}Python Exception <class 'ValueError'>: String attribute must not be empty.\r\n.*"]]
+
+# Now execute each test plan.
+foreach plan $test_plans {
+    set global_disassembler_name [lindex $plan 0]
+    set expected_pattern [lindex $plan 1]
+
+    with_test_prefix "global_disassembler=${global_disassembler_name}" {
+	# Remove all existing disassemblers.
+	py_remove_all_disassemblers
+
+	# If we have a disassembler to load, do it now.
+	if { $global_disassembler_name != "" } {
+	    gdb_test_no_output "python add_global_disassembler($global_disassembler_name)"
+	}
+
+	# Disassemble main, and check the disassembler output.
+	gdb_test "disassemble main" $expected_pattern
+    }
+}
+
+# Check some errors relating to DisassemblerResult creation.
+with_test_prefix "DisassemblerResult errors" {
+    gdb_test "python gdb.disassembler.DisassemblerResult(0, 'abc')" \
+	[multi_line \
+	     "ValueError: Length must be greater than 0." \
+	     "Error while executing Python code."]
+    gdb_test "python gdb.disassembler.DisassemblerResult(-1, 'abc')" \
+	[multi_line \
+	     "ValueError: Length must be greater than 0." \
+	     "Error while executing Python code."]
+    gdb_test "python gdb.disassembler.DisassemblerResult(1, '')" \
+	[multi_line \
+	     "ValueError: String must not be empty." \
+	     "Error while executing Python code."]
+}
+
+# Check that the architecture specific disassemblers can override the
+# global disassembler.
+#
+# First, register a global disassembler, and check it is in place.
+with_test_prefix "GLOBAL tagging disassembler" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"GLOBAL\"), None)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Now register an architecture specific disassembler, and check it
+# overrides the global disassembler.
+with_test_prefix "LOCAL tagging disassembler" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"LOCAL\"), \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = LOCAL\r\n.*"
+}
+
+# Now remove the architecture specific disassembler, and check that
+# the global disassembler kicks back in.
+with_test_prefix "GLOBAL tagging disassembler again" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(None, \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Check that a DisassembleInfo becomes invalid after the call into the
+# disassembler.
+with_test_prefix "DisassembleInfo becomes invalid" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python add_global_disassembler(GlobalCachingDisassembler)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## CACHED\r\n.*"
+    gdb_test "python GlobalCachingDisassembler.check()" "PASS"
+}
+
+# Test the memory source aspect of the builtin disassembler.
+with_test_prefix "memory source api" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python analyzing_disassembler = add_global_disassembler(AnalyzingDisassembler)"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*"
+    gdb_test "python analyzing_disassembler.find_replacement_candidate()" \
+	"Replace from $hex to $hex with NOP"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*" \
+	"second disassembler pass"
+    gdb_test "python analyzing_disassembler.check()" \
+	"PASS"
+}
+
+# Test the 'maint info python-disassemblers command.
+with_test_prefix "maint info python-disassemblers" {
+    py_remove_all_disassemblers
+    gdb_test "maint info python-disassemblers" "No Python disassemblers registered\\." \
+	"list disassemblers, none registered"
+    gdb_test_no_output "python disasm = add_global_disassembler(BuiltinDisassembler)"
+    gdb_test "maint info python-disassemblers" \
+	[multi_line \
+	     "Architecture\\s+Disassember Name" \
+	     "GLOBAL\\s+BuiltinDisassembler\\s+\\(Matches current architecture\\)"] \
+	"list disassemblers, single global disassembler"
+    gdb_test_no_output "python arch = gdb.selected_inferior().architecture().name()"
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(disasm, arch)"
+    gdb_test "maint info python-disassemblers" \
+	[multi_line \
+	     "Architecture\\s+Disassember Name" \
+	     "\[^\r\n\]+BuiltinDisassembler\\s+\\(Matches current architecture\\)" \
+	     "GLOBAL\\s+BuiltinDisassembler"] \
+	"list disassemblers, multiple disassemblers registered"
+}
+
+# Check the attempt to create a "new" DisassembleInfo object fails.
+with_test_prefix "Bad DisassembleInfo creation" {
+    gdb_test_no_output "python my_info = InvalidDisassembleInfo()"
+    gdb_test "python print(my_info.is_valid())" "True"
+    gdb_test "python gdb.disassembler.builtin_disassemble(my_info)" \
+	[multi_line \
+	     "RuntimeError: DisassembleInfo is no longer valid\\." \
+	     "Error while executing Python code\\."]
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.py b/gdb/testsuite/gdb.python/py-disasm.py
new file mode 100644
index 00000000000..62925ce8c06
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.py
@@ -0,0 +1,614 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import gdb
+import gdb.disassembler
+import struct
+import sys
+
+from gdb.disassembler import Disassembler, DisassemblerResult
+
+# A global, holds the program-counter address at which we should
+# perform the extra disassembly that this script provides.
+current_pc = None
+
+
+# Remove all currently registered disassemblers.
+def remove_all_python_disassemblers():
+    for a in gdb.architecture_names():
+        gdb.disassembler.register_disassembler(None, a)
+    gdb.disassembler.register_disassembler(None, None)
+
+
+class TestDisassembler(Disassembler):
+    """A base class for disassemblers within this script to inherit from.
+    Implements the __call__ method and ensures we only do any
+    disassembly wrapping for the global CURRENT_PC."""
+
+    def __init__(self):
+        global current_pc
+
+        super().__init__("TestDisassembler")
+        self.__info = None
+        if current_pc == None:
+            raise gdb.GdbError("no current_pc set")
+
+    def __call__(self, info):
+        global current_pc
+
+        if info.address != current_pc:
+            return None
+        self.__info = info
+        return self.disassemble(info)
+
+    def get_info(self):
+        return self.__info
+
+    def disassemble(self, info):
+        raise NotImplementedError("override the disassemble method")
+
+
+class GlobalPreInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo before disassembly has occurred."""
+
+    def disassemble(self, info):
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalPostInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo after disassembly has occurred."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalReadDisassembler(TestDisassembler):
+    """Check the DisassembleInfo.read_memory method.  Calls the builtin
+    disassembler, then reads all of the bytes of this instruction, and
+    adds them as a comment to the disassembler output."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        len = result.length
+        str = ""
+        for o in range(len):
+            if str != "":
+                str += " "
+            v = bytes(info.read_memory(1, o))[0]
+            if sys.version_info[0] < 3:
+                v = struct.unpack("<B", v)
+            str += "0x%02x" % v
+        text = result.string + "\t## bytes = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalAddrDisassembler(TestDisassembler):
+    """Check the gdb.format_address method."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        arch = info.architecture
+        addr = info.address
+        program_space = info.progspace
+        str = gdb.format_address(addr, program_space, arch)
+        text = result.string + "\t## addr = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class NonMemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a non-memory error instead of performing any disassembly."""
+
+    def disassemble(self, info):
+        raise gdb.GdbError("non-memory error instead of a result")
+
+
+class NonMemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a non-memory error after calling the builtin disassembler."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        raise gdb.GdbError("non-memory error after builtin disassembler")
+
+
+class MemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a memory error, ignore the error and disassemble."""
+
+    def disassemble(self, info):
+        tag = "## FAIL"
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError:
+            tag = "## AFTER ERROR"
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t" + tag
+        return DisassemblerResult(result.length, text)
+
+
+class MemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a memory error after calling the builtin disassembler, but
+    before we return a result."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        # The following read will throw an error.
+        info.read_memory(1, -info.address + 2)
+        return DisassemblerResult(1, "BAD")
+
+
+class RethrowMemoryErrorDisassembler(TestDisassembler):
+    """Catch and rethrow a memory error."""
+
+    def disassemble(self, info):
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError as e:
+            raise gdb.MemoryError("cannot read code at address 0x2")
+        return DisassemblerResult(1, "BAD")
+
+
+class ResultOfWrongType(TestDisassembler):
+    """Return something that is not a DisassemblerResult from disassemble method"""
+
+    class Blah:
+        def __init__(self, length, string):
+            self.length = length
+            self.string = string
+
+    def disassemble(self, info):
+        return self.Blah(1, "ABC")
+
+
+class ResultWrapper(gdb.disassembler.DisassemblerResult):
+    def __init__(self, length, string, length_x=None, string_x=None):
+        super().__init__(length, string)
+        if length_x is None:
+            self.__length = length
+        else:
+            self.__length = length_x
+        if string_x is None:
+            self.__string = string
+        else:
+            self.__string = string_x
+
+    @property
+    def length(self):
+        return self.__length
+
+    @property
+    def string(self):
+        return self.__string
+
+
+class ResultWithInvalidLength(TestDisassembler):
+    """Return a result object with an invalid length."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        return ResultWrapper(result.length, result.string, 0)
+
+
+class ResultWithInvalidString(TestDisassembler):
+    """Return a result object with an empty string."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        return ResultWrapper(result.length, result.string, None, "")
+
+
+class TaggingDisassembler(TestDisassembler):
+    """A simple disassembler that just tags the output."""
+
+    def __init__(self, tag):
+        super().__init__()
+        self._tag = tag
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## tag = %s" % self._tag
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalCachingDisassembler(TestDisassembler):
+    """A disassembler that caches the DisassembleInfo that is passed in,
+    as well as a copy of the original DisassembleInfo.
+
+    Once the call into the disassembler is complete then the
+    DisassembleInfo objects become invalid, and any calls into them
+    should trigger an exception."""
+
+    # This is where we cache the DisassembleInfo objects.
+    cached_insn_disas = []
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+    def disassemble(self, info):
+        """Disassemble the instruction, add a CACHED comment to the output,
+        and cache the DisassembleInfo so that it is not garbage collected."""
+        GlobalCachingDisassembler.cached_insn_disas.append (info)
+        GlobalCachingDisassembler.cached_insn_disas.append (self.MyInfo(info))
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## CACHED"
+        return DisassemblerResult(result.length, text)
+
+    @staticmethod
+    def check():
+        """Check that all of the methods on the cached DisassembleInfo trigger an
+        exception."""
+        for info in GlobalCachingDisassembler.cached_insn_disas:
+            assert isinstance(info, gdb.disassembler.DisassembleInfo)
+            assert not info.is_valid()
+            try:
+                val = info.address
+                raise gdb.GdbError("DisassembleInfo.address is still valid")
+            except RuntimeError as e:
+                assert str(e) == "DisassembleInfo is no longer valid."
+            except:
+                raise gdb.GdbError("DisassembleInfo.address raised an unexpected exception")
+
+            try:
+                val = info.architecture
+                raise gdb.GdbError("DisassembleInfo.architecture is still valid")
+            except RuntimeError as e:
+                assert str(e) == "DisassembleInfo is no longer valid."
+            except:
+                raise gdb.GdbError(
+                    "DisassembleInfo.architecture raised an unexpected exception"
+                )
+
+            try:
+                val = info.read_memory(1, 0)
+                raise gdb.GdbError("DisassembleInfo.read is still valid")
+            except RuntimeError as e:
+                assert str(e) == "DisassembleInfo is no longer valid."
+            except:
+                raise gdb.GdbError("DisassembleInfo.read raised an unexpected exception")
+
+        print("PASS")
+
+
+class GlobalNullDisassembler(TestDisassembler):
+    """A disassembler that does not change the output at all."""
+
+    def disassemble(self, info):
+        pass
+
+
+class FaultingMemorySourceDisassembler(TestDisassembler):
+    """Throw a memory error from the memory source read_memory method."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            # Throw a memory error with a specific address.  We don't
+            # expect this address to show up in the output though.
+            raise gdb.MemoryError(0x1234)
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class FailingMemorySourceDisassembler(TestDisassembler):
+    """Throw a non-memory error from the memory source read_memory method."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            raise gdb.GdbError("the memory source failed")
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class MemorySourceNotABufferDisassembler(TestDisassembler):
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            return 1234
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class MemorySourceBufferTooLongDisassembler(TestDisassembler):
+    """The read memory returns too many bytes."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            buffer = super().read_memory(length, offset)
+            # Create a new memory view made by duplicating BUFFER.  This
+            # will trigger an error as GDB expects a buffer of exactly
+            # LENGTH to be returned, while this will return a buffer of
+            # 2*LENGTH.
+            return memoryview(
+                bytes([int.from_bytes(x, "little") for x in (list(buffer[0:]) * 2)])
+            )
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class BuiltinDisassembler(Disassembler):
+    """Just calls the builtin disassembler."""
+
+    def __init__(self):
+        super().__init__("BuiltinDisassembler")
+
+    def __call__(self, info):
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class AnalyzingDisassembler(Disassembler):
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        """Wrapper around builtin DisassembleInfo type that overrides the
+        read_memory method."""
+
+        def __init__(self, info, start, end, nop_bytes):
+            """INFO is the DisassembleInfo we are wrapping.  START and END are
+            addresses, and NOP_BYTES should be a memoryview object.
+
+            The length (END - START) should be the same as the length
+            of NOP_BYTES.
+
+            Any memory read requests outside the START->END range are
+            serviced normally, but any attempt to read within the
+            START->END range will return content from NOP_BYTES."""
+            super().__init__(info)
+            self._start = start
+            self._end = end
+            self._nop_bytes = nop_bytes
+
+        def _read_replacement(self, length, offset):
+            """Return a slice of the buffer representing the replacement nop
+            instructions."""
+
+            assert self._nop_bytes is not None
+            rb = self._nop_bytes
+
+            # If this request is outside of a nop instruction then we don't know
+            # what to do, so just raise a memory error.
+            if offset >= len(rb) or (offset + length) > len(rb):
+                raise gdb.MemoryError("invalid length and offset combination")
+
+            # Return only the slice of the nop instruction as requested.
+            s = offset
+            e = offset + length
+            return rb[s:e]
+
+        def read_memory(self, length, offset=0):
+            """Callback used by the builtin disassembler to read the contents of
+            memory."""
+
+            # If this request is within the region we are replacing with 'nop'
+            # instructions, then call the helper function to perform that
+            # replacement.
+            if self._start is not None:
+                assert self._end is not None
+                if self.address >= self._start and self.address < self._end:
+                    return self._read_replacement(length, offset)
+
+            # Otherwise, we just forward this request to the default read memory
+            # implementation.
+            return super().read_memory(length, offset)
+
+    def __init__(self):
+        """Constructor."""
+        super().__init__("AnalyzingDisassembler")
+
+        # Details about the instructions found during the first disassembler
+        # pass.
+        self._pass_1_length = []
+        self._pass_1_insn = []
+        self._pass_1_address = []
+
+        # The start and end address for the instruction we will replace with
+        # one or more 'nop' instructions during pass two.
+        self._start = None
+        self._end = None
+
+        # The index in the _pass_1_* lists for where the nop instruction can
+        # be found, also, the buffer of bytes that make up a nop instruction.
+        self._nop_index = None
+        self._nop_bytes = None
+
+        # A flag that indicates if we are in the first or second pass of
+        # this disassembler test.
+        self._first_pass = True
+
+        # The disassembled instructions collected during the second pass.
+        self._pass_2_insn = []
+
+        # A copy of _pass_1_insn that has been modified to include the extra
+        # 'nop' instructions we plan to insert during the second pass.  This
+        # is then checked against _pass_2_insn after the second disassembler
+        # pass has completed.
+        self._check = []
+
+    def __call__(self, info):
+        """Called to perform the disassembly."""
+
+        # Override the info object, this provides access to our
+        # read_memory function.
+        info = self.MyInfo(info, self._start, self._end, self._nop_bytes)
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        # Record some informaiton about the first 'nop' instruction we find.
+        if self._nop_index is None and result.string == "nop":
+            self._nop_index = len(self._pass_1_length)
+            # The offset in the following read_memory call defaults to 0.
+            print("APB: Reading nop bytes")
+            self._nop_bytes = info.read_memory(result.length)
+
+        # Record information about each instruction that is disassembled.
+        # This test is performed in two passes, and we need different
+        # information in each pass.
+        if self._first_pass:
+            self._pass_1_length.append(result.length)
+            self._pass_1_insn.append(result.string)
+            self._pass_1_address.append(info.address)
+        else:
+            self._pass_2_insn.append(result.string)
+
+        return result
+
+    def find_replacement_candidate(self):
+        """Call this after the first disassembly pass.  This identifies a suitable
+        instruction to replace with 'nop' instruction(s)."""
+
+        if self._nop_index is None:
+            raise gdb.GdbError("no nop was found")
+
+        nop_idx = self._nop_index
+        nop_length = self._pass_1_length[nop_idx]
+
+        # First we look for an instruction that is larger than a nop
+        # instruction, but whose length is an exact multiple of the nop
+        # instruction's length.
+        replace_idx = None
+        for idx in range(len(self._pass_1_length)):
+            if (
+                idx > 0
+                and idx != nop_idx
+                and self._pass_1_insn[idx] != "nop"
+                and self._pass_1_length[idx] > self._pass_1_length[nop_idx]
+                and self._pass_1_length[idx] % self._pass_1_length[nop_idx] == 0
+            ):
+                replace_idx = idx
+                break
+
+        # If we still don't have a replacement candidate, then search again,
+        # this time looking for an instruciton that is the same length as a
+        # nop instruction.
+        if replace_idx is None:
+            for idx in range(len(self._pass_1_length)):
+                if (
+                    idx > 0
+                    and idx != nop_idx
+                    and self._pass_1_insn[idx] != "nop"
+                    and self._pass_1_length[idx] == self._pass_1_length[nop_idx]
+                ):
+                    replace_idx = idx
+                    break
+
+        # Weird, the nop instruction must be larger than every other
+        # instruction, or all instructions are 'nop'?
+        if replace_idx is None:
+            raise gdb.GdbError("can't find an instruction to replace")
+
+        # Record the instruction range that will be replaced with 'nop'
+        # instructions, and mark that we are now on the second pass.
+        self._start = self._pass_1_address[replace_idx]
+        self._end = self._pass_1_address[replace_idx] + self._pass_1_length[replace_idx]
+        self._first_pass = False
+        print("Replace from 0x%x to 0x%x with NOP" % (self._start, self._end))
+
+        # Finally, build the expected result.  Create the _check list, which
+        # is a copy of _pass_1_insn, but replace the instruction we
+        # identified above with a series of 'nop' instructions.
+        self._check = list(self._pass_1_insn)
+        nop_count = int(self._pass_1_length[replace_idx] / self._pass_1_length[nop_idx])
+        nops = ["nop"] * nop_count
+        self._check[replace_idx : (replace_idx + 1)] = nops
+
+    def check(self):
+        """Call this after the second disassembler pass to validate the output."""
+        if self._check != self._pass_2_insn:
+            print("APB, Check : %s" % self._check)
+            print("APB, Result: %s" % self._pass_2_insn)
+            raise gdb.GdbError("mismatch")
+        print("PASS")
+
+
+def add_global_disassembler(dis_class):
+    """Create an instance of DIS_CLASS and register it as a global disassembler."""
+    dis = dis_class()
+    gdb.disassembler.register_disassembler(dis, None)
+    return dis
+
+
+class InvalidDisassembleInfo(gdb.disassembler.DisassembleInfo):
+    """An attempt to create a DisassembleInfo sub-class without calling
+    the parent class init method.
+
+    Attempts to use instances of this class should throw an error
+    saying that the DisassembleInfo is not valid, despite this class
+    having all of the required attributes.
+
+    The reason why this class will never be valid is that an internal
+    field (within the C++ code) can't be initialized without calling
+    the parent class init method."""
+
+    def __init__(self):
+        assert current_pc is not None
+
+    def is_valid(self):
+        return True
+
+    @property
+    def address(self):
+        global current_pc
+        return current_pc
+
+    @property
+    def architecture(self):
+        return gdb.selected_inferior().architecture()
+
+    @property
+    def progspace(self):
+        return gdb.selected_inferior().progspace
+
+
+# Start with all disassemblers removed.
+remove_all_python_disassemblers()
+
+print("Python script imported")


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv5 3/5] gdb/python: implement the print_insn extension language hook
  2022-05-18 10:08             ` Andrew Burgess
@ 2022-05-18 12:08               ` Eli Zaretskii
  2022-05-23  8:59                 ` Andrew Burgess
  0 siblings, 1 reply; 80+ messages in thread
From: Eli Zaretskii @ 2022-05-18 12:08 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

> From: Andrew Burgess <aburgess@redhat.com>
> Cc: gdb-patches@sourceware.org
> Date: Wed, 18 May 2022 11:08:50 +0100
> 
> Thanks for all your reviews of this patch series.  Really appreciate all
> you work.
> 
> I've updated the patch to address all your feedback.

Thanks.  Two last nits:

> +@node Disassembly In Python
> +@cindex python instruction disassembly
> +@subsubsection Instruction Disassembly In Python

@cindex should be after @subsubsection, since you are indexing the
body of the subsection, not the title.

> +If @var{disassembler} is @code{None} then any disassembler currently
> +registered for @var{architecture} is removed, the previously
> +registered disassembler is still returned.

I think this should rephrased to say

  ... then any disassembler currently registered for
  @var{architecture} is deregistered and returned.

because "removed" is ambiguous in this context, and also "previously
registered" could be interpreted as meaning the disassembler that was
registered _before_ the one you are removing.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv5 3/5] gdb/python: implement the print_insn extension language hook
  2022-05-18 12:08               ` Eli Zaretskii
@ 2022-05-23  8:59                 ` Andrew Burgess
  2022-05-23 11:23                   ` Eli Zaretskii
  0 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2022-05-23  8:59 UTC (permalink / raw)
  To: Eli Zaretskii; +Cc: gdb-patches

Eli Zaretskii via Gdb-patches <gdb-patches@sourceware.org> writes:

>> From: Andrew Burgess <aburgess@redhat.com>
>> Cc: gdb-patches@sourceware.org
>> Date: Wed, 18 May 2022 11:08:50 +0100
>> 
>> Thanks for all your reviews of this patch series.  Really appreciate all
>> you work.
>> 
>> I've updated the patch to address all your feedback.
>
> Thanks.  Two last nits:
>
>> +@node Disassembly In Python
>> +@cindex python instruction disassembly
>> +@subsubsection Instruction Disassembly In Python
>
> @cindex should be after @subsubsection, since you are indexing the
> body of the subsection, not the title.
>
>> +If @var{disassembler} is @code{None} then any disassembler currently
>> +registered for @var{architecture} is removed, the previously
>> +registered disassembler is still returned.
>
> I think this should rephrased to say
>
>   ... then any disassembler currently registered for
>   @var{architecture} is deregistered and returned.
>
> because "removed" is ambiguous in this context, and also "previously
> registered" could be interpreted as meaning the disassembler that was
> registered _before_ the one you are removing.

I made the changes you suggested.  The revised patch is below.

Thanks,
Andrew

---

commit 4b3190d7a9345034e964191be20db33b9be439f2
Author: Andrew Burgess <andrew.burgess@embecosm.com>
Date:   Fri Sep 17 18:12:34 2021 +0100

    gdb/python: implement the print_insn extension language hook
    
    This commit extends the Python API to include disassembler support.
    
    The motivation for this commit was to provide an API by which the user
    could write Python scripts that would augment the output of the
    disassembler.
    
    To achieve this I have followed the model of the existing libopcodes
    disassembler, that is, instructions are disassembled one by one.  This
    does restrict the type of things that it is possible to do from a
    Python script, i.e. all additional output has to fit on a single line,
    but this was all I needed, and creating something more complex would,
    I think, require greater changes to how GDB's internal disassembler
    operates.
    
    The disassembler API is contained in the new gdb.disassembler module,
    which defines the following classes:
    
      DisassembleInfo
    
          Similar to libopcodes disassemble_info structure, has read-only
      properties: address, architecture, and progspace.  And has methods:
      __init__, read_memory, and is_valid.
    
          Each time GDB wants an instruction disassembled, an instance of
      this class is passed to a user written disassembler function, by
      reading the properties, and calling the methods (and other support
      methods in the gdb.disassembler module) the user can perform and
      return the disassembly.
    
      Disassembler
    
          This is a base-class which user written disassemblers should
      inherit from.  This base class provides base implementations of
      __init__ and __call__ which the user written disassembler should
      override.
    
      DisassemblerResult
    
          This class can be used to hold the result of a call to the
      disassembler, it's really just a wrapper around a string (the text
      of the disassembled instruction) and a length (in bytes).  The user
      can return an instance of this class from Disassembler.__call__ to
      represent the newly disassembled instruction.
    
    The gdb.disassembler module also provides the following functions:
    
      register_disassembler
    
          This function registers an instance of a Disassembler sub-class
      as a disassembler, either for one specific architecture, or, as a
      global disassembler for all architectures.
    
      builtin_disassemble
    
          This provides access to GDB's builtin disassembler.  A common
      use case that I see is augmenting the existing disassembler output.
      The user code can call this function to have GDB disassemble the
      instruction in the normal way.  The user gets back a
      DisassemblerResult object, which they can then read in order to
      augment the disassembler output in any way they wish.
    
          This function also provides a mechanism to intercept the
      disassemblers reads of memory, thus the user can adjust what GDB
      sees when it is disassembling.
    
    The included documentation provides a more detailed description of the
    API.
    
    There is also a new CLI command added:
    
      maint info python-disassemblers
    
    This command is defined in the Python gdb.disassemblers module, and
    can be used to list the currently registered Python disassemblers.

diff --git a/gdb/Makefile.in b/gdb/Makefile.in
index 418094775a5..42a0ebb371b 100644
--- a/gdb/Makefile.in
+++ b/gdb/Makefile.in
@@ -393,6 +393,7 @@ SUBDIR_PYTHON_SRCS = \
 	python/py-cmd.c \
 	python/py-connection.c \
 	python/py-continueevent.c \
+	python/py-disasm.c \
 	python/py-event.c \
 	python/py-evtregistry.c \
 	python/py-evts.c \
diff --git a/gdb/NEWS b/gdb/NEWS
index a72fee81550..f5ed294fe8f 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -41,6 +41,40 @@ maintenance info line-table
      This is the same format that GDB uses when printing address, symbol,
      and offset information from the disassembler.
 
+  ** New Python API for wrapping GDB's disassembler:
+
+     - gdb.disassembler.register_disassembler(DISASSEMBLER, ARCH).
+       DISASSEMBLER is a sub-class of gdb.disassembler.Disassembler.
+       ARCH is either None or a string containing a bfd architecture
+       name.  DISASSEMBLER is registered as a disassembler for
+       architecture ARCH, or for all architectures if ARCH is None.
+       The previous disassembler registered for ARCH is returned, this
+       can be None if no previous disassembler was registered.
+
+     - gdb.disassembler.Disassembler is the class from which all
+       disassemblers should inherit.  Its constructor takes a string,
+       a name for the disassembler, which is currently only used in
+       some debug output.  Sub-classes should override the __call__
+       method to perform disassembly, invoking __call__ on this base
+       class will raise an exception.
+
+     - gdb.disassembler.DisassembleInfo is the class used to describe
+       a single disassembly request from GDB.  An instance of this
+       class is passed to the __call__ method of
+       gdb.disassembler.Disassembler and has the following read-only
+       attributes: 'address', and 'architecture', as well as the
+       following method: 'read_memory'.
+
+     - gdb.disassembler.builtin_disassemble(INFO, MEMORY_SOURCE),
+       calls GDB's builtin disassembler on INFO, which is a
+       gdb.disassembler.DisassembleInfo object.  MEMORY_SOURCE is
+       optional, its default value is None.  If MEMORY_SOURCE is not
+       None then it must be an object that has a 'read_memory' method.
+
+     - gdb.disassembler.DisassemblerResult is a class that can be used
+       to wrap the result of a call to a Disassembler.  It has
+       read-only attributes 'length' and 'string'.
+
 *** Changes in GDB 12
 
 * DBX mode is deprecated, and will be removed in GDB 13
diff --git a/gdb/data-directory/Makefile.in b/gdb/data-directory/Makefile.in
index b606fc654b5..cf5226f3961 100644
--- a/gdb/data-directory/Makefile.in
+++ b/gdb/data-directory/Makefile.in
@@ -69,6 +69,7 @@ PYTHON_DIR = python
 PYTHON_INSTALL_DIR = $(DESTDIR)$(GDB_DATADIR)/$(PYTHON_DIR)
 PYTHON_FILE_LIST = \
 	gdb/__init__.py \
+	gdb/disassembler.py \
 	gdb/FrameDecorator.py \
 	gdb/FrameIterator.py \
 	gdb/frames.py \
diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
index e5c1ee33aac..7945f863612 100644
--- a/gdb/doc/gdb.texinfo
+++ b/gdb/doc/gdb.texinfo
@@ -39544,6 +39544,51 @@
 @item maint info jit
 Print information about JIT code objects loaded in the current inferior.
 
+@anchor{maint info python-disassemblers}
+@kindex maint info python-disassemblers
+@item maint info python-disassemblers
+This command is defined within the @code{gdb.disassembler} Python
+module (@pxref{Disassembly In Python}), and will only be present after
+that module has been imported.  To force the module to be imported do
+the following:
+
+@smallexample
+(@value{GDBP}) python import gdb.disassembler
+@end smallexample
+
+This command lists all the architectures for which a disassembler is
+currently registered, and the name of the disassembler.  If a
+disassembler is registered for all architectures, then this is listed
+last against the @samp{GLOBAL} architecture.
+
+If one of the disassemblers would be selected for the architecture of
+the current inferior, then this disassembler will be marked.
+
+The following example shows a situation in which two disassemblers are
+registered, initially the @samp{i386} disassembler matches the current
+architecture, then the architecture is changed, now the @samp{GLOBAL}
+disassembler matches.
+
+@smallexample
+@group
+(@value{GDBP}) show architecture
+The target architecture is set to "auto" (currently "i386").
+(@value{GDBP}) maint info python-disassemblers
+Architecture        Disassember Name
+i386                Disassembler_1	(Matches current architecture)
+GLOBAL              Disassembler_2
+@end group
+@group
+(@value{GDBP}) set architecture arm
+The target architecture is set to "arm".
+(@value{GDBP}) maint info python-disassemblers
+quit
+Architecture        Disassember Name
+i386                Disassembler_1
+GLOBAL              Disassembler_2	(Matches current architecture)
+@end group
+@end smallexample
+
 @kindex set displaced-stepping
 @kindex show displaced-stepping
 @cindex displaced stepping support
diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
index cb5283e03c0..29d7d249d6e 100644
--- a/gdb/doc/python.texi
+++ b/gdb/doc/python.texi
@@ -222,6 +222,7 @@
 * Registers In Python::         Python representation of registers.
 * Connections In Python::	Python representation of connections.
 * TUI Windows In Python::       Implementing new TUI windows.
+* Disassembly In Python::       Instruction Disassembly In Python
 @end menu
 
 @node Basic Python
@@ -598,6 +599,7 @@
 related prompts are prohibited from being changed.
 @end defun
 
+@anchor{gdb_architecture_names}
 @defun gdb.architecture_names ()
 Return a list containing all of the architecture names that the
 current build of @value{GDBN} supports.  Each architecture name is a
@@ -3278,6 +3280,7 @@
 particular frame (@pxref{Frames In Python}).
 @end defun
 
+@anchor{gdbpy_inferior_read_memory}
 @findex Inferior.read_memory
 @defun Inferior.read_memory (address, length)
 Read @var{length} addressable memory units from the inferior, starting at
@@ -6562,6 +6565,294 @@
 values can be 1 (left), 2 (middle), or 3 (right).
 @end defun
 
+@node Disassembly In Python
+@subsubsection Instruction Disassembly In Python
+@cindex python instruction disassembly
+
+@value{GDBN}'s builtin disassembler can be extended, or even replaced,
+using the Python API.  The disassembler related features are contained
+within the @code{gdb.disassembler} module:
+
+@deftp {class} gdb.disassembler.DisassembleInfo
+Disassembly is driven by instances of this class.  Each time
+@value{GDBN} needs to disassemble an instruction, an instance of this
+class is created and passed to a registered disassembler.  The
+disassembler is then responsible for disassembling an instruction and
+returning a result.
+
+Instances of this type are usually created within @value{GDBN},
+however, it is possible to create a copy of an instance of this type,
+see the description of @code{__init__} for more details.
+
+This class has the following properties and methods:
+
+@defvar DisassembleInfo.address
+A read-only integer containing the address at which @value{GDBN}
+wishes to disassemble a single instruction.
+@end defvar
+
+@defvar DisassembleInfo.architecture
+The @code{gdb.Architecture} (@pxref{Architectures In Python}) for
+which @value{GDBN} is currently disassembling, this property is
+read-only.
+@end defvar
+
+@defvar DisassembleInfo.progspace
+The @code{gdb.Progspace} (@pxref{Progspaces In Python,,Program Spaces
+In Python}) for which @value{GDBN} is currently disassembling, this
+property is read-only.
+@end defvar
+
+@defun DisassembleInfo.read_memory (length, offset)
+This method allows the disassembler to read the bytes of the
+instruction to be disassembled.  The method reads @var{length} bytes,
+starting at @var{offset} from
+@code{DisassembleInfo.address}.
+
+It is important that the disassembler read the instruction bytes using
+this method, rather than reading inferior memory directly, as in some
+cases @value{GDBN} disassembles from an internal buffer rather than
+directly from inferior memory, calling this method handles this
+detail.
+
+Returns a buffer object, which behaves much like an array or a string,
+just as @code{Inferior.read_memory} does
+(@pxref{gdbpy_inferior_read_memory,,Inferior.read_memory}).  The
+length of the returned buffer will always be exactly @var{length}.
+
+If @value{GDBN} is unable to read the required memory then a
+@code{gdb.MemoryError} exception is raised (@pxref{Exception
+Handling}), raising any other exception type from this method is an
+error.
+
+While disassembling a single instruction there could be multiple calls
+to this method, and the same bytes might be read multiple times.  Any
+single call might only read a subset of the total instruction bytes.
+
+Consider, for example, an architecture with 2-byte and 4-byte
+instructions, the disassembler might first read 2-bytes from memory in
+order to establish if the instruction is 2 or 4 bytes long.  If the
+instruction is 4-bytes long the disassembler might then read the
+remaining 2 bytes, or might read the entire 4 bytes again.  The memory
+reading behaviour of the disassembler on different architectures could
+be different.
+@end defun
+
+@defun DisassembleInfo.is_valid ()
+Returns @code{True} if the @code{DisassembleInfo} object is valid,
+@code{False} if not.  A @code{DisassembleInfo} object will become
+invalid once the disassembly call for which the @code{DisassembleInfo}
+was created, has returned.  Calling other @code{DisassembleInfo}
+methods, or accessing @code{DisassembleInfo} properties, will raise a
+@code{RuntimeError} exception if it is invalid.
+@end defun
+
+@defun DisassembleInfo.__init__ (info)
+This can be used to create a new @code{DisassembleInfo} object that is
+a copy of @var{info}.  The copy will have the same @code{address},
+@code{architecture}, and @code{progspace} values as @var{info}, and
+will become invalid at the same time as @var{info}.
+
+This method exists so that sub-classes of @code{DisassembleInfo} can
+be created, these sub-classes must be initialized as copies of an
+existing @code{DisassembleInfo} object, but sub-classes might choose
+to override the @code{read_memory} method, and so control what
+@value{GDBN} sees when reading from memory
+(@pxref{builtin_disassemble}).
+
+@end defun
+
+@end deftp
+
+@deftp {class} Disassembler
+This is a base class from which all user implemented disassemblers
+must inherit.
+
+@defun Disassembler.__init__ (name)
+The constructor takes @var{name}, a string, which should be a short
+name for this disassembler.
+@end defun
+
+@defun Disassembler.__call__ (info)
+The @code{__call__} method must be overridden by sub-classes to
+perform disassembly.  Calling @code{__call__} on this base class will
+raise a @code{NotImplementedError} exception.
+
+The @var{info} argument is an instance of @code{DisassembleInfo}, and
+describes the instruction that @value{GDBN} wants disassembling.
+
+If this function returns @code{None}, this indicates to @value{GDBN}
+that this sub-class doesn't wish to disassemble the requested
+instruction.  @value{GDBN} will then use its builtin disassembler to
+perform the disassembly.
+
+Alternatively, this function can return a @code{DisassemblerResult}
+that represents the disassembled instruction, this type is described
+in more detail below.
+
+The @code{__call__} method can raise a @code{gdb.MemoryError}
+exception (@pxref{Exception Handling}) to indicate to @value{GDBN}
+that there was a problem accessing the required memory, this will then
+be displayed by @value{GDBN} within the disassembler output.
+
+Any other exception type raised by the @code{__call__} method is an
+error, @value{GDBN} will display the error and then use its builtin
+disassembler to disassemble the instruction instead.
+@end defun
+@end deftp
+
+@deftp {class} DisassemblerResult
+This class is used to hold the result of calling
+@w{@code{Disassembler.__call__}}, and represents a single disassembled
+instruction.  This class has the following properties and methods:
+
+@defun DisassemblerResult.__init__ (@var{length}, @var{string})
+Initialize an instance of this class, @var{length} is the length of
+the disassembled instruction in bytes, which must be greater than
+zero, and @var{string} is a non-empty string that represents the
+disassembled instruction.
+@end defun
+
+@defvar DisassemblerResult.length
+A read-only property containing the length of the disassembled
+instruction in bytes, this will always be greater than zero.
+@end defvar
+
+@defvar DisassemblerResult.string
+A read-only property containing a non-empty string representing the
+disassembled instruction.
+@end defvar
+@end deftp
+
+The following functions are also contained in the
+@code{gdb.disassembler} module:
+
+@defun register_disassembler (disassembler, architecture)
+The @var{disassembler} must be a sub-class of
+@code{gdb.disassembler.Disassembler} or @code{None}.
+
+The optional @var{architecture} is either a string, or the value
+@code{None}.  If it is a string, then it should be the name of an
+architecture known to @value{GDBN}, as returned either from
+@code{gdb.Architecture.name}
+(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from
+@code{gdb.architecture_names}
+(@pxref{gdb_architecture_names,,gdb.architecture_names}).
+
+The @var{disassembler} will be installed for the architecture named by
+@var{architecture}, or if @var{architecture} is @code{None}, then
+@var{disassembler} will be installed as a global disassembler for use
+by all architectures.
+
+@cindex disassembler in Python, global vs.@: specific
+@cindex search order for disassembler in Python
+@cindex look up of disassembler in Python
+@value{GDBN} only records a single disassembler for each architecture,
+and a single global disassembler.  Calling
+@code{register_disassembler} for an architecture, or for the global
+disassembler, will replace any existing disassembler registered for
+that @var{architecture} value.  The previous disassembler is returned.
+
+If @var{disassembler} is @code{None} then any disassembler currently
+registered for @var{architecture} is deregistered and returned.
+
+When @value{GDBN} is looking for a disassembler to use, @value{GDBN}
+first looks for an architecture specific disassembler.  If none has
+been registered then @value{GDBN} looks for a global disassembler (one
+registered with @var{architecture} set to @code{None}).  Only one
+disassembler is called to perform disassembly, so, if there is both an
+architecture specific disassembler, and a global disassembler
+registered, it is the architecture specific disassembler that will be
+used.
+
+@value{GDBN} tracks the architecture specific, and global
+disassemblers separately, so it doesn't matter in which order
+disassemblers are created or registered; an architecture specific
+disassembler, if present, will always be used in preference to a
+global disassembler.
+
+You can use the @kbd{maint info python-disassemblers} command
+(@pxref{maint info python-disassemblers}) to see which disassemblers
+have been registered.
+@end defun
+
+@anchor{builtin_disassemble}
+@defun builtin_disassemble (info)
+This function calls back into @value{GDBN}'s builtin disassembler to
+disassemble the instruction identified by @var{info}, an instance, or
+sub-class, of @code{DisassembleInfo}
+
+If the builtin disassembler is successful then an instance of
+@code{DisassemblerResult} is returned.
+
+If the builtin disassembler fails then a @code{gdb.MemoryError}
+exception will be raised.
+
+When the builtin disassembler needs to read memory the
+@code{read_memory} method on @var{info} will be called, by
+sub-classing @code{DisassembleInfo} and overriding the
+@code{read_memory} method, it is possible to intercept calls to
+@code{read_memory} by the builtin disassembler, and to modify the
+values returned.
+
+It is important to understand that, even when
+@code{DisassembleInfo.read_memory} raises a @code{gdb.MemoryError}, it
+is the internal disassembler itself that reports the memory error to
+@value{GDBN}.  The reason for this is that the disassembler might
+probe memory to see if a byte is readable or not; if the byte can't be
+read then the disassembler may choose not to report an error, but
+instead to disassemble the bytes that it does have available.
+@end defun
+
+Here is an example that registers a global disassembler.  The new
+disassembler invokes the builtin disassembler, and then adds a
+comment, @code{## Comment}, to each line of disassembly output:
+
+@smallexample
+class ExampleDisassembler(gdb.disassembler.Disassembler):
+    def __init__(self):
+        super().__init__("ExampleDisassembler")
+
+    def __call__(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        length = result.length
+        text = result.string + "\t## Comment"
+        return gdb.disassembler.DisassemblerResult(length, text)
+
+gdb.disassembler.register_disassembler(ExampleDisassembler())
+@end smallexample
+
+The following example creates a sub-class of @code{DisassembleInfo} in
+order to intercept the @code{read_memory} calls, within
+@code{read_memory} any bytes read from memory have the two 4-bit
+nibbles swapped around.  This isn't a very useful adjustment, but
+serves as an example.
+
+@smallexample
+class MyInfo(gdb.disassembler.DisassembleInfo):
+    def __init__(self, info):
+        super().__init__(info)
+
+    def read_memory(self, length, offset):
+        buffer = super().read_memory(length, offset)
+        result = bytearray()
+        for b in buffer:
+            v = int.from_bytes(b, 'little')
+            v = (v << 4) & 0xf0 | (v >> 4)
+            result.append(v)
+        return memoryview(result)
+
+class NibbleSwapDisassembler(gdb.disassembler.Disassembler):
+    def __init__(self):
+        super().__init__("NibbleSwapDisassembler")
+
+    def __call__(self, info):
+        info = MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+gdb.disassembler.register_disassembler(NibbleSwapDisassembler())
+@end smallexample
+
 @node Python Auto-loading
 @subsection Python Auto-loading
 @cindex Python auto-loading
diff --git a/gdb/python/lib/gdb/disassembler.py b/gdb/python/lib/gdb/disassembler.py
new file mode 100644
index 00000000000..5a2d94a5fac
--- /dev/null
+++ b/gdb/python/lib/gdb/disassembler.py
@@ -0,0 +1,178 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+"""Disassembler related module."""
+
+import gdb
+import _gdb.disassembler
+
+# Re-export everything from the _gdb.disassembler module, which is
+# defined within GDB's C++ code.
+from _gdb.disassembler import *
+
+# Module global dictionary of gdb.disassembler.Disassembler objects.
+# The keys of this dictionary are bfd architecture names, or the
+# special value None.
+#
+# When a request to disassemble comes in we first lookup the bfd
+# architecture name from the gdbarch, if that name exists in this
+# dictionary then we use that Disassembler object.
+#
+# If there's no architecture specific disassembler then we look for
+# the key None in this dictionary, and if that key exists, we use that
+# disassembler.
+#
+# If none of the above checks found a suitable disassembler, then no
+# disassembly is performed in Python.
+_disassemblers_dict = {}
+
+
+class Disassembler(object):
+    """A base class from which all user implemented disassemblers must
+    inherit."""
+
+    def __init__(self, name):
+        """Constructor.  Takes a name, which should be a string, which can be
+        used to identify this disassembler in diagnostic messages."""
+        self.name = name
+
+    def __call__(self, info):
+        """A default implementation of __call__.  All sub-classes must
+        override this method.  Calling this default implementation will throw
+        a NotImplementedError exception."""
+        raise NotImplementedError("Disassembler.__call__")
+
+
+def register_disassembler(disassembler, architecture=None):
+    """Register a disassembler.  DISASSEMBLER is a sub-class of
+    gdb.disassembler.Disassembler.  ARCHITECTURE is either None or a
+    string, the name of an architecture known to GDB.
+
+    DISASSEMBLER is registered as a disassembler for ARCHITECTURE, or
+    all architectures when ARCHITECTURE is None.
+
+    Returns the previous disassembler registered with this
+    ARCHITECTURE value.
+    """
+
+    if not isinstance(disassembler, Disassembler) and disassembler is not None:
+        raise TypeError("disassembler should sub-class gdb.disassembler.Disassembler")
+
+    old = None
+    if architecture in _disassemblers_dict:
+        old = _disassemblers_dict[architecture]
+        del _disassemblers_dict[architecture]
+    if disassembler is not None:
+        _disassemblers_dict[architecture] = disassembler
+
+    # Call the private _set_enabled function within the
+    # _gdb.disassembler module.  This function sets a global flag
+    # within GDB's C++ code that enables or dissables the Python
+    # disassembler functionality, this improves performance of the
+    # disassembler by avoiding unneeded calls into Python when we know
+    # that no disassemblers are registered.
+    _gdb.disassembler._set_enabled(len(_disassemblers_dict) > 0)
+    return old
+
+
+def _print_insn(info):
+    """This function is called by GDB when it wants to disassemble an
+    instruction.  INFO describes the instruction to be
+    disassembled."""
+
+    def lookup_disassembler(arch):
+        try:
+            name = arch.name()
+            if name is None:
+                return None
+            if name in _disassemblers_dict:
+                return _disassemblers_dict[name]
+            if None in _disassemblers_dict:
+                return _disassemblers_dict[None]
+            return None
+        except:
+            # It's pretty unlikely this exception case will ever
+            # trigger, one situation would be if the user somehow
+            # corrupted the _disassemblers_dict variable such that it
+            # was no longer a dictionary.
+            return None
+
+    disassembler = lookup_disassembler(info.architecture)
+    if disassembler is None:
+        return None
+    return disassembler(info)
+
+
+class maint_info_py_disassemblers_cmd(gdb.Command):
+    """
+    List all registered Python disassemblers.
+
+    List the name of all registered Python disassemblers, next to the
+    name of the architecture for which the disassembler is registered.
+
+    The global Python disassembler is listed next to the string
+    'GLOBAL'.
+
+    The disassembler that matches the architecture of the currently
+    selected inferior will be marked, this is an indication of which
+    disassembler will be invoked if any disassembly is performed in
+    the current inferior.
+    """
+
+    def __init__(self):
+        super().__init__("maintenance info python-disassemblers", gdb.COMMAND_USER)
+
+    def invoke(self, args, from_tty):
+        # If no disassemblers are registered, tell the user.
+        if len(_disassemblers_dict) == 0:
+            print("No Python disassemblers registered.")
+            return
+
+        # Figure out the longest architecture name, so we can
+        # correctly format the table of results.
+        longest_arch_name = 0
+        for architecture in _disassemblers_dict:
+            if architecture is not None:
+                name = _disassemblers_dict[architecture].name
+                if len(name) > longest_arch_name:
+                    longest_arch_name = len(name)
+
+        # Figure out the name of the current architecture.  There
+        # should always be a current inferior, but if, somehow, there
+        # isn't, then leave curr_arch as the empty string, which will
+        # not then match agaisnt any architecture in the dictionary.
+        curr_arch = ""
+        if gdb.selected_inferior() is not None:
+            curr_arch = gdb.selected_inferior().architecture().name()
+
+        # Now print the dictionary of registered disassemblers out to
+        # the user.
+        match_tag = "\t(Matches current architecture)"
+        fmt_len = max(longest_arch_name, len("Architecture"))
+        format_string = "{:" + str(fmt_len) + "s} {:s}"
+        print(format_string.format("Architecture", "Disassember Name"))
+        for architecture in _disassemblers_dict:
+            if architecture is not None:
+                name = _disassemblers_dict[architecture].name
+                if architecture == curr_arch:
+                    name += match_tag
+                    match_tag = ""
+                print(format_string.format(architecture, name))
+        if None in _disassemblers_dict:
+            name = _disassemblers_dict[None].name + match_tag
+            print(format_string.format("GLOBAL", name))
+
+
+maint_info_py_disassemblers_cmd()
diff --git a/gdb/python/py-disasm.c b/gdb/python/py-disasm.c
new file mode 100644
index 00000000000..c67b2e97664
--- /dev/null
+++ b/gdb/python/py-disasm.c
@@ -0,0 +1,1057 @@
+/* Python interface to instruction disassembly.
+
+   Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "defs.h"
+#include "python-internal.h"
+#include "dis-asm.h"
+#include "arch-utils.h"
+#include "charset.h"
+#include "disasm.h"
+#include "progspace.h"
+
+/* Implement gdb.disassembler.DisassembleInfo type.  An object of this type
+   represents a single disassembler request from GDB.  */
+
+struct disasm_info_object
+{
+  PyObject_HEAD
+
+  /* The architecture in which we are disassembling.  */
+  struct gdbarch *gdbarch;
+
+  /* The program_space in which we are disassembling.  */
+  struct program_space *program_space;
+
+  /* Address of the instruction to disassemble.  */
+  bfd_vma address;
+
+  /* The disassemble_info passed from core GDB, this contains the
+     callbacks necessary to read the instruction from core GDB, and to
+     print the disassembled instruction.  */
+  disassemble_info *gdb_info;
+
+  /* If copies of this object are created then they are chained together
+     via this NEXT pointer, this allows all the copies to be invalidated at
+     the same time as the parent object.  */
+  struct disasm_info_object *next;
+};
+
+extern PyTypeObject disasm_info_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_info_object");
+
+/* Implement gdb.disassembler.DisassemblerResult type, an object that holds
+   the result of calling the disassembler.  This is mostly the length of
+   the disassembled instruction (in bytes), and the string representing the
+   disassembled instruction.  */
+
+struct disasm_result_object
+{
+  PyObject_HEAD
+
+  /* The length of the disassembled instruction in bytes.  */
+  int length;
+
+  /* A buffer which, when allocated, holds the disassembled content of an
+     instruction.  */
+  string_file *content;
+};
+
+extern PyTypeObject disasm_result_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_result_object");
+
+/* When this is false we fast path out of gdbpy_print_insn, which should
+   keep the performance impact of the Python disassembler down.  This is
+   set to true from Python by calling gdb.disassembler._set_enabled() when
+   the user registers a disassembler.  */
+
+static bool python_print_insn_enabled = false;
+
+/* A sub-class of gdb_disassembler that holds a pointer to a Python
+   DisassembleInfo object.  A pointer to an instance of this class is
+   placed in the application_data field of the disassemble_info that is
+   used when we call gdbarch_print_insn.  */
+
+struct gdbpy_disassembler : public gdb_printing_disassembler
+{
+  /* Constructor.  */
+  gdbpy_disassembler (disasm_info_object *obj, PyObject *memory_source);
+
+  /* Get the DisassembleInfo object pointer.  */
+  disasm_info_object *
+  py_disasm_info () const
+  {
+    return m_disasm_info_object;
+  }
+
+  /* Callbacks used by disassemble_info.  */
+  static void memory_error_func (int status, bfd_vma memaddr,
+				 struct disassemble_info *info);
+  static void print_address_func (bfd_vma addr,
+				  struct disassemble_info *info);
+  static int read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+			       unsigned int len,
+			       struct disassemble_info *info);
+
+  /* Return a reference to an optional that contains the address at which a
+     memory error occurred.  The optional will only have a value if a
+     memory error actually occurred.  */
+  const gdb::optional<CORE_ADDR> &memory_error_address () const
+  { return m_memory_error_address; }
+
+  /* Return the content of the disassembler as a string.  The contents are
+     moved out of the disassembler, so after this call the disassembler
+     contents have been reset back to empty.  */
+  std::string release ()
+  {
+    return m_string_file.release ();
+  }
+
+private:
+
+  /* Where the disassembler result is written.  */
+  string_file m_string_file;
+
+  /* The DisassembleInfo object we are disassembling for.  */
+  disasm_info_object *m_disasm_info_object;
+
+  /* When the user indicates that a memory error has occurred then the
+     address of the memory error is stored in here.  */
+  gdb::optional<CORE_ADDR> m_memory_error_address;
+
+  /* When the user calls the builtin_disassemble function, if they pass a
+     memory source object then a pointer to the object is placed in here,
+     otherwise, this field is nullptr.  */
+  PyObject *m_memory_source;
+};
+
+/* Return true if OBJ is still valid, otherwise, return false.  A valid OBJ
+   will have a non-nullptr gdb_info field.  */
+
+static bool
+disasm_info_object_is_valid (disasm_info_object *obj)
+{
+  return obj->gdb_info != nullptr;
+}
+
+/* Fill in OBJ with all the other arguments.  */
+
+static void
+disasm_info_fill (disasm_info_object *obj, struct gdbarch *gdbarch,
+		  program_space *progspace, bfd_vma address,
+		  disassemble_info *di, disasm_info_object *next)
+{
+  obj->gdbarch = gdbarch;
+  obj->program_space = progspace;
+  obj->address = address;
+  obj->gdb_info = di;
+  obj->next = next;
+}
+
+/* Implement DisassembleInfo.__init__.  Takes a single argument that must
+   be another DisassembleInfo object and copies the contents from the
+   argument into this new object.  */
+
+static int
+disasm_info_init (PyObject *self, PyObject *args, PyObject *kwargs)
+{
+  static const char *keywords[] = { "info", NULL };
+  PyObject *info_obj;
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kwargs, "O!", keywords,
+					&disasm_info_object_type,
+					&info_obj))
+    return -1;
+
+  disasm_info_object *other = (disasm_info_object *) info_obj;
+  disasm_info_object *info = (disasm_info_object *) self;
+  disasm_info_fill (info, other->gdbarch, other->program_space,
+		    other->address, other->gdb_info, other->next);
+  other->next = info;
+
+  /* As the OTHER object now holds a pointer to INFO we inc the ref count
+     on INFO.  This stops INFO being deleted until OTHER has gone away.  */
+  Py_INCREF ((PyObject *) info);
+  return 0;
+}
+
+/* The tp_dealloc callback for the DisassembleInfo type.  */
+
+static void
+disasm_info_dealloc (PyObject *self)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+
+  /* We no longer care about the object our NEXT pointer points at, so we
+     can decrement its reference count.  This macro handles the case when
+     NEXT is nullptr.  */
+  Py_XDECREF ((PyObject *) obj->next);
+
+  /* Now core deallocation behaviour.  */
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* Implement DisassembleInfo.is_valid(), really just a wrapper around the
+   disasm_info_object_is_valid function above.  */
+
+static PyObject *
+disasmpy_info_is_valid (PyObject *self, PyObject *args)
+{
+  disasm_info_object *disasm_obj = (disasm_info_object *) self;
+
+  if (disasm_info_object_is_valid (disasm_obj))
+    Py_RETURN_TRUE;
+
+  Py_RETURN_FALSE;
+}
+
+/* Set the Python exception to be a gdb.MemoryError object, with ADDRESS
+   as its payload.  */
+
+static void
+disasmpy_set_memory_error_for_address (CORE_ADDR address)
+{
+  PyObject *address_obj = gdb_py_object_from_longest (address).release ();
+  PyErr_SetObject (gdbpy_gdb_memory_error, address_obj);
+}
+
+/* Ensure that a gdb.disassembler.DisassembleInfo is valid.  */
+
+#define DISASMPY_DISASM_INFO_REQUIRE_VALID(Info)			\
+  do {									\
+    if (!disasm_info_object_is_valid (Info))				\
+      {									\
+	PyErr_SetString (PyExc_RuntimeError,				\
+			 _("DisassembleInfo is no longer valid."));	\
+	return nullptr;							\
+      }									\
+  } while (0)
+
+/* Initialise OBJ, a DisassemblerResult object with LENGTH and CONTENT.
+   OBJ might already have been initialised, in which case any existing
+   content should be discarded before the new CONTENT is moved in.  */
+
+static void
+disasmpy_init_disassembler_result (disasm_result_object *obj, int length,
+				   std::string content)
+{
+  if (obj->content == nullptr)
+    obj->content = new string_file;
+  else
+    obj->content->clear ();
+
+  obj->length = length;
+  *(obj->content) = std::move (content);
+}
+
+/* Implement gdb.disassembler.builtin_disassemble().  Calls back into GDB's
+   builtin disassembler.  The first argument is a DisassembleInfo object
+   describing what to disassemble.  The second argument is optional and
+   provides a mechanism to modify the memory contents that the builtin
+   disassembler will actually disassemble.
+
+   Returns an instance of gdb.disassembler.DisassemblerResult, an object
+   that wraps a disassembled instruction, or it raises a
+   gdb.MemoryError.  */
+
+static PyObject *
+disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *info_obj, *memory_source_obj = nullptr;
+  static const char *keywords[] = { "info", "memory_source", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
+					&disasm_info_object_type, &info_obj,
+					&memory_source_obj))
+    return nullptr;
+
+  disasm_info_object *disasm_info = (disasm_info_object *) info_obj;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (disasm_info);
+
+  /* Where the result will be written.  */
+  gdbpy_disassembler disassembler (disasm_info, memory_source_obj);
+
+  /* Now actually perform the disassembly.  */
+  int length
+    = gdbarch_print_insn (disasm_info->gdbarch, disasm_info->address,
+			  disassembler.disasm_info ());
+
+  if (length == -1)
+    {
+
+      /* In an ideal world, every disassembler should always call the
+	 memory error function before returning a status of -1 as the only
+	 error a disassembler should encounter is a failure to read
+	 memory.  Unfortunately, there are some disassemblers who don't
+	 follow this rule, and will return -1 without calling the memory
+	 error function.
+
+	 To make the Python API simpler, we just classify everything as a
+	 memory error, but the message has to be modified for the case
+	 where the disassembler didn't call the memory error function.  */
+      if (disassembler.memory_error_address ().has_value ())
+	{
+	  CORE_ADDR addr = *disassembler.memory_error_address ();
+	  disasmpy_set_memory_error_for_address (addr);
+	}
+      else
+	PyErr_SetString (gdbpy_gdb_memory_error, "Unknown disassembly error");
+      return nullptr;
+    }
+
+  /* Instructions are either non-zero in length, or we got an error,
+     indicated by a length of -1, which we handled above.  */
+  gdb_assert (length > 0);
+
+  /* We should not have seen a memory error in this case.  */
+  gdb_assert (!disassembler.memory_error_address ().has_value ());
+
+  /* Create a DisassemblerResult containing the results.  */
+  std::string content = disassembler.release ();
+  PyTypeObject *type = &disasm_result_object_type;
+  gdbpy_ref<disasm_result_object> res
+    ((disasm_result_object *) type->tp_alloc (type, 0));
+  disasmpy_init_disassembler_result (res.get (), length, std::move (content));
+  return reinterpret_cast<PyObject *> (res.release ());
+}
+
+/* Implement gdb._set_enabled function.  Takes a boolean parameter, and
+   sets whether GDB should enter the Python disassembler code or not.
+
+   This is called from within the Python code when a new disassembler is
+   registered.  When no disassemblers are registered the global C++ flag
+   is set to false, and GDB never even enters the Python environment to
+   check for a disassembler.
+
+   When the user registers a new Python disassembler, the global C++ flag
+   is set to true, and now GDB will enter the Python environment to check
+   if there's a disassembler registered for the current architecture.  */
+
+static PyObject *
+disasmpy_set_enabled (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *newstate;
+  static const char *keywords[] = { "state", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O", keywords,
+					&newstate))
+    return nullptr;
+
+  if (!PyBool_Check (newstate))
+    {
+      PyErr_SetString (PyExc_TypeError,
+		       _("The value passed to `_set_enabled' must be a boolean."));
+      return nullptr;
+    }
+
+  python_print_insn_enabled = PyObject_IsTrue (newstate);
+  Py_RETURN_NONE;
+}
+
+/* Implement DisassembleInfo.read_memory(LENGTH, OFFSET).  Read LENGTH
+   bytes at OFFSET from the start of the instruction currently being
+   disassembled, and return a memory buffer containing the bytes.
+
+   OFFSET defaults to zero if it is not provided.  LENGTH is required.  If
+   the read fails then this will raise a gdb.MemoryError exception.  */
+
+static PyObject *
+disasmpy_info_read_memory (PyObject *self, PyObject *args, PyObject *kw)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+
+  LONGEST length, offset = 0;
+  gdb::unique_xmalloc_ptr<gdb_byte> buffer;
+  static const char *keywords[] = { "length", "offset", nullptr };
+
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "L|L", keywords,
+					&length, &offset))
+    return nullptr;
+
+  /* The apparent address from which we are reading memory.  Note that in
+     some cases GDB actually disassembles instructions from a buffer, so
+     we might not actually be reading this information directly from the
+     inferior memory.  This is all hidden behind the read_memory_func API
+     within the disassemble_info structure.  */
+  CORE_ADDR address = obj->address + offset;
+
+  /* Setup a buffer to hold the result.  */
+  buffer.reset ((gdb_byte *) xmalloc (length));
+
+  /* Read content into BUFFER.  If the read fails then raise a memory
+     error, otherwise, convert BUFFER to a Python memory buffer, and return
+     it to the user.  */
+  disassemble_info *info = obj->gdb_info;
+  if (info->read_memory_func ((bfd_vma) address, buffer.get (),
+			      (unsigned int) length, info) != 0)
+    {
+      disasmpy_set_memory_error_for_address (address);
+      return nullptr;
+    }
+  return gdbpy_buffer_to_membuf (std::move (buffer), address, length);
+}
+
+/* Implement DisassembleInfo.address attribute, return the address at which
+   GDB would like an instruction disassembled.  */
+
+static PyObject *
+disasmpy_info_address (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdb_py_object_from_longest (obj->address).release ();
+}
+
+/* Implement DisassembleInfo.architecture attribute.  Return the
+   gdb.Architecture in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_architecture (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdbarch_to_arch_object (obj->gdbarch);
+}
+
+/* Implement DisassembleInfo.progspace attribute.  Return the
+   gdb.Progspace in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_progspace (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return pspace_to_pspace_object (obj->program_space).release ();
+}
+
+/* This implements the disassemble_info read_memory_func callback and is
+   called from the libopcodes disassembler when the disassembler wants to
+   read memory.
+
+   From the INFO argument we can find the gdbpy_disassembler object for
+   which we are disassembling, and from that object we can find the
+   DisassembleInfo for the current disassembly call.
+
+   This function reads the instruction bytes by calling the read_memory
+   method on the DisassembleInfo object.  This method might have been
+   overridden by user code.
+
+   Read LEN bytes from MEMADDR and place them into BUFF.  Return 0 on
+   success (in which case BUFF has been filled), or -1 on error, in which
+   case the contents of BUFF are undefined.  */
+
+int
+gdbpy_disassembler::read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+				      unsigned int len,
+				      struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  disasm_info_object *obj = dis->py_disasm_info ();
+
+  /* The DisassembleInfo.read_memory method expects an offset from the
+     address stored within the DisassembleInfo object; calculate that
+     offset here.  */
+  LONGEST offset = (LONGEST) memaddr - (LONGEST) obj->address;
+
+  /* Now call the DisassembleInfo.read_memory method.  This might have been
+     overridden by the user.  */
+  gdbpy_ref<> result_obj (PyObject_CallMethod ((PyObject *) obj,
+					       "read_memory",
+					       "KL", len, offset));
+
+  /* Handle any exceptions.  */
+  if (result_obj == nullptr)
+    {
+      /* If we got a gdb.MemoryError then we ignore this and just report
+	 that the read failed to the caller.  The caller is then
+	 responsible for calling the memory_error_func if it wants to.
+	 Remember, the disassembler might just be probing to see if these
+	 bytes can be read, if we automatically call the memory error
+	 function, we can end up registering an error prematurely.  */
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	PyErr_Clear ();
+      else
+	gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Convert the result to a buffer.  */
+  Py_buffer py_buff;
+  if (!PyObject_CheckBuffer (result_obj.get ())
+      || PyObject_GetBuffer (result_obj.get(), &py_buff, PyBUF_CONTIG_RO) < 0)
+    {
+      PyErr_Format (PyExc_TypeError,
+		    _("Result from read_memory is not a buffer"));
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Wrap PY_BUFF so that it is cleaned up correctly at the end of this
+     scope.  */
+  Py_buffer_up buffer_up (&py_buff);
+
+  /* Validate that the buffer is the correct length.  */
+  if (py_buff.len != len)
+    {
+      PyErr_Format (PyExc_ValueError,
+		    _("Buffer returned from read_memory is sized %d instead of the expected %d"),
+		    py_buff.len, len);
+      gdbpy_print_stack ();
+      return -1;
+    }
+
+  /* Copy the data out of the Python buffer and return success.  */
+  const gdb_byte *buffer = (const gdb_byte *) py_buff.buf;
+  memcpy (buff, buffer, len);
+  return 0;
+}
+
+/* Implement DisassemblerResult.length attribute, return the length of the
+   disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_length (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  return gdb_py_object_from_longest (obj->length).release ();
+}
+
+/* Implement DisassemblerResult.string attribute, return the content string
+   of the disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_string (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+
+  gdb_assert (obj->content != nullptr);
+  gdb_assert (strlen (obj->content->c_str ()) > 0);
+  gdb_assert (obj->length > 0);
+  return PyUnicode_Decode (obj->content->c_str (),
+			   obj->content->size (),
+			   host_charset (), nullptr);
+}
+
+/* Implement DisassemblerResult.__init__.  Takes two arguments, an
+   integer, the length in bytes of the disassembled instruction, and a
+   string, the disassembled content of the instruction.  */
+
+static int
+disasmpy_result_init (PyObject *self, PyObject *args, PyObject *kwargs)
+{
+  static const char *keywords[] = { "length", "string", NULL };
+  int length;
+  const char *string;
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kwargs, "is", keywords,
+					&length, &string))
+    return -1;
+
+  if (length <= 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("Length must be greater than 0."));
+      return -1;
+    }
+
+  if (strlen (string) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("String must not be empty."));
+      return -1;
+    }
+
+  disasm_result_object *obj = (disasm_result_object *) self;
+  disasmpy_init_disassembler_result (obj, length, std::string (string));
+
+  return 0;
+}
+
+/* Implement memory_error_func callback for disassemble_info.  Extract the
+   underlying DisassembleInfo Python object, and set a memory error on
+   it.  */
+
+void
+gdbpy_disassembler::memory_error_func (int status, bfd_vma memaddr,
+				       struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  dis->m_memory_error_address.emplace (memaddr);
+}
+
+/* Wrapper of print_address.  */
+
+void
+gdbpy_disassembler::print_address_func (bfd_vma addr,
+					struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  print_address (dis->arch (), addr, (struct ui_file *) info->stream);
+}
+
+/* constructor.  */
+
+gdbpy_disassembler::gdbpy_disassembler (disasm_info_object *obj,
+					PyObject *memory_source)
+  : gdb_printing_disassembler (obj->gdbarch, &m_string_file,
+			       read_memory_func, memory_error_func,
+			       print_address_func),
+    m_disasm_info_object (obj),
+    m_memory_source (memory_source)
+{ /* Nothing.  */ }
+
+/* A wrapper around a reference to a Python DisassembleInfo object, which
+   ensures that the object is marked as invalid when we leave the enclosing
+   scope.
+
+   Each DisassembleInfo is created in gdbpy_print_insn, and is done with by
+   the time that function returns.  However, there's nothing to stop a user
+   caching a reference to the DisassembleInfo, and thus keeping the object
+   around.
+
+   We therefore have the notion of a DisassembleInfo becoming invalid, this
+   happens when gdbpy_print_insn returns.  This class is responsible for
+   marking the DisassembleInfo as invalid in its destructor.  */
+
+struct scoped_disasm_info_object
+{
+  /* Constructor.  */
+  scoped_disasm_info_object (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+			     disassemble_info *info)
+    : m_disasm_info (allocate_disasm_info_object ())
+  {
+    disasm_info_fill (m_disasm_info.get (), gdbarch, current_program_space,
+		      memaddr, info, nullptr);
+  }
+
+  /* Upon destruction mark m_diasm_info as invalid.  */
+  ~scoped_disasm_info_object ()
+  {
+    /* Invalidate the original DisassembleInfo object as well as any copies
+       that the user might have made.  */
+    for (disasm_info_object *obj = m_disasm_info.get ();
+	 obj != nullptr;
+	 obj = obj->next)
+      obj->gdb_info = nullptr;
+  }
+
+  /* Return a pointer to the underlying disasm_info_object instance.  */
+  disasm_info_object *
+  get () const
+  {
+    return m_disasm_info.get ();
+  }
+
+private:
+
+  /* Wrapper around the call to PyObject_New, this wrapper function can be
+     called from the constructor initialization list, while PyObject_New, a
+     macro, can't.  */
+  static disasm_info_object *
+  allocate_disasm_info_object ()
+  {
+    return (disasm_info_object *) PyObject_New (disasm_info_object,
+						&disasm_info_object_type);
+  }
+
+  /* A reference to a gdb.disassembler.DisassembleInfo object.  When this
+     containing instance goes out of scope this reference is released,
+     however, the user might be holding other references to the
+     DisassembleInfo object in Python code, so the underlying object might
+     not be deleted.  */
+  gdbpy_ref<disasm_info_object> m_disasm_info;
+};
+
+/* See python-internal.h.  */
+
+gdb::optional<int>
+gdbpy_print_insn (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+		  disassemble_info *info)
+{
+  /* Early exit case.  This must be done as early as possible, and
+     definitely before we enter Python environment.  The
+     python_print_insn_enabled flag is set (from Python) only when the user
+     has installed one (or more) Python disassemblers.  So in the common
+     case (no custom disassembler installed) this flag will be false,
+     allowing for a quick return.  */
+  if (!gdb_python_initialized || !python_print_insn_enabled)
+    return {};
+
+  gdbpy_enter enter_py (get_current_arch (), current_language);
+
+  /* Import the gdb.disassembler module.  */
+  gdbpy_ref<> gdb_python_disassembler_module
+    (PyImport_ImportModule ("gdb.disassembler"));
+  if (gdb_python_disassembler_module == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Get the _print_insn attribute from the module, this should be the
+     function we are going to call to actually perform the disassembly.  */
+  gdbpy_ref<> hook
+    (PyObject_GetAttrString (gdb_python_disassembler_module.get (),
+			     "_print_insn"));
+  if (hook == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Create the new DisassembleInfo object we will pass into Python.  This
+     object will be marked as invalid when we leave this scope.  */
+  scoped_disasm_info_object scoped_disasm_info (gdbarch, memaddr, info);
+  disasm_info_object *disasm_info = scoped_disasm_info.get ();
+
+  /* Call into the registered disassembler to (possibly) perform the
+     disassembly.  */
+  PyObject *insn_disas_obj = (PyObject *) disasm_info;
+  gdbpy_ref<> result (PyObject_CallFunctionObjArgs (hook.get (),
+						    insn_disas_obj,
+						    nullptr));
+
+  if (result == nullptr)
+    {
+      /* The call into Python code resulted in an exception.  If this was a
+	 gdb.MemoryError, then we can figure out an address and call the
+	 disassemble_info::memory_error_func to report the error back to
+	 core GDB.  Any other exception type we assume means a bug in the
+	 user's code, and print stack.  */
+
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	{
+	  /* A gdb.MemoryError might have an address attribute which
+	     contains the address at which the memory error occurred.  If
+	     this is the case then use this address, otherwise, fallback to
+	     just using the address of the instruction we were asked to
+	     disassemble.  */
+	  PyObject *error_type, *error_value, *error_traceback;
+	  CORE_ADDR addr;
+
+	  PyErr_Fetch (&error_type, &error_value, &error_traceback);
+
+	  if (error_value != nullptr
+	      && PyObject_HasAttrString (error_value, "address"))
+	    {
+	      PyObject *addr_obj = PyObject_GetAttrString (error_value,
+							   "address");
+	      if (get_addr_from_python (addr_obj, &addr) < 0)
+		addr = disasm_info->address;
+	    }
+	  else
+	    addr = disasm_info->address;
+
+	  PyErr_Clear ();
+	  info->memory_error_func (-1, addr, info);
+	  return gdb::optional<int> (-1);
+	}
+      else
+	{
+	  /* Anything that is not gdb.MemoryError.  */
+	  gdbpy_print_stack ();
+	  return {};
+	}
+    }
+  else if (result == Py_None)
+    {
+      /* A return value of None indicates that the Python code could not,
+	 or doesn't want to, disassemble this instruction.  Just return an
+	 empty result and core GDB will try to disassemble this for us.  */
+      return {};
+    }
+
+  /* Check the result is a DisassemblerResult (or a sub-class).  */
+  if (!PyObject_IsInstance (result.get (),
+			    (PyObject *) &disasm_result_object_type))
+    {
+      PyErr_SetString (PyExc_TypeError,
+		       _("Result is not a DisassemblerResult."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* The call into Python neither raised an exception, or returned None.
+     Check to see if the result looks valid.  */
+  gdbpy_ref<> length_obj (PyObject_GetAttrString (result.get (), "length"));
+  if (length_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  gdbpy_ref<> string_obj (PyObject_GetAttrString (result.get (), "string"));
+  if (string_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+  if (!gdbpy_is_string (string_obj.get ()))
+    {
+      PyErr_SetString (PyExc_TypeError, _("String attribute is not a string."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  gdb::unique_xmalloc_ptr<char> string
+    = gdbpy_obj_to_string (string_obj.get ());
+  if (string == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  long length;
+  if (!gdb_py_int_as_long (length_obj.get (), &length))
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  long max_insn_length = (gdbarch_max_insn_length_p (gdbarch) ?
+			  gdbarch_max_insn_length (gdbarch) : INT_MAX);
+  if (length <= 0)
+    {
+      PyErr_SetString
+	(PyExc_ValueError,
+	 _("Invalid length attribute: length must be greater than 0."));
+      gdbpy_print_stack ();
+      return {};
+    }
+  if (length > max_insn_length)
+    {
+      PyErr_Format
+	(PyExc_ValueError,
+	 _("Invalid length attribute: length %d greater than architecture maximum of %d"),
+	 length, max_insn_length);
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  if (strlen (string.get ()) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError, _("String attribute must not be empty."));
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Print the disassembled instruction back to core GDB, and return the
+     length of the disassembled instruction.  */
+  info->fprintf_func (info->stream, "%s", string.get ());
+  return gdb::optional<int> (length);
+}
+
+/* The tp_dealloc callback for the DisassemblerResult type.  Takes care of
+   deallocating the content buffer.  */
+
+static void
+disasmpy_dealloc_result (PyObject *self)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  delete obj->content;
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* The get/set attributes of the gdb.disassembler.DisassembleInfo type.  */
+
+static gdb_PyGetSetDef disasm_info_object_getset[] = {
+  { "address", disasmpy_info_address, nullptr,
+    "Start address of the instruction to disassemble.", nullptr },
+  { "architecture", disasmpy_info_architecture, nullptr,
+    "Architecture to disassemble in", nullptr },
+  { "progspace", disasmpy_info_progspace, nullptr,
+    "Program space to disassemble in", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* The methods of the gdb.disassembler.DisassembleInfo type.  */
+
+static PyMethodDef disasm_info_object_methods[] = {
+  { "read_memory", (PyCFunction) disasmpy_info_read_memory,
+    METH_VARARGS | METH_KEYWORDS,
+    "read_memory (LEN, OFFSET = 0) -> Octets[]\n\
+Read LEN octets for the instruction to disassemble." },
+  { "is_valid", disasmpy_info_is_valid, METH_NOARGS,
+    "is_valid () -> Boolean.\n\
+Return true if this DisassembleInfo is valid, false if not." },
+  {nullptr}  /* Sentinel */
+};
+
+/* The get/set attributes of the gdb.disassembler.DisassemblerResult type.  */
+
+static gdb_PyGetSetDef disasm_result_object_getset[] = {
+  { "length", disasmpy_result_length, nullptr,
+    "Length of the disassembled instruction.", nullptr },
+  { "string", disasmpy_result_string, nullptr,
+    "String representing the disassembled instruction.", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* These are the methods we add into the _gdb.disassembler module, which
+   are then imported into the gdb.disassembler module.  These are global
+   functions that support performing disassembly.  */
+
+PyMethodDef python_disassembler_methods[] =
+{
+  { "builtin_disassemble", (PyCFunction) disasmpy_builtin_disassemble,
+    METH_VARARGS | METH_KEYWORDS,
+    "builtin_disassemble (INFO, MEMORY_SOURCE = None) -> None\n\
+Disassemble using GDB's builtin disassembler.  INFO is an instance of\n\
+gdb.disassembler.DisassembleInfo.  The MEMORY_SOURCE, if not None, should\n\
+be an object with the read_memory method." },
+  { "_set_enabled", (PyCFunction) disasmpy_set_enabled,
+    METH_VARARGS | METH_KEYWORDS,
+    "_set_enabled (STATE) -> None\n\
+Set whether GDB should call into the Python _print_insn code or not." },
+  {nullptr, nullptr, 0, nullptr}
+};
+
+/* Structure to define the _gdb.disassembler module.  */
+
+static struct PyModuleDef python_disassembler_module_def =
+{
+  PyModuleDef_HEAD_INIT,
+  "_gdb.disassembler",
+  nullptr,
+  -1,
+  python_disassembler_methods,
+  nullptr,
+  nullptr,
+  nullptr,
+  nullptr
+};
+
+/* Called to initialize the Python structures in this file.  */
+
+int
+gdbpy_initialize_disasm ()
+{
+  /* Create the _gdb.disassembler module, and add it to the _gdb module.  */
+
+  PyObject *gdb_disassembler_module;
+  gdb_disassembler_module = PyModule_Create (&python_disassembler_module_def);
+  if (gdb_disassembler_module == nullptr)
+    return -1;
+  PyModule_AddObject(gdb_module, "disassembler", gdb_disassembler_module);
+
+  /* This is needed so that 'import _gdb.disassembler' will work.  */
+  PyObject *dict = PyImport_GetModuleDict ();
+  PyDict_SetItemString (dict, "_gdb.disassembler", gdb_disassembler_module);
+
+  disasm_info_object_type.tp_new = PyType_GenericNew;
+  if (PyType_Ready (&disasm_info_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassembleInfo",
+			      (PyObject *) &disasm_info_object_type) < 0)
+    return -1;
+
+  disasm_result_object_type.tp_new = PyType_GenericNew;
+  if (PyType_Ready (&disasm_result_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassemblerResult",
+			      (PyObject *) &disasm_result_object_type) < 0)
+    return -1;
+
+  return 0;
+}
+
+/* Describe the gdb.disassembler.DisassembleInfo type.  */
+
+PyTypeObject disasm_info_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassembleInfo",		/*tp_name*/
+  sizeof (disasm_info_object),			/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  disasm_info_dealloc,				/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,	/*tp_flags*/
+  "GDB instruction disassembler object",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  disasm_info_object_methods,			/* tp_methods */
+  0,						/* tp_members */
+  disasm_info_object_getset,			/* tp_getset */
+  0,						/* tp_base */
+  0,						/* tp_dict */
+  0,						/* tp_descr_get */
+  0,						/* tp_descr_set */
+  0,						/* tp_dictoffset */
+  disasm_info_init,				/* tp_init */
+  0,						/* tp_alloc */
+};
+
+/* Describe the gdb.disassembler.DisassemblerResult type.  */
+
+PyTypeObject disasm_result_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassemblerResult",	/*tp_name*/
+  sizeof (disasm_result_object),		/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  disasmpy_dealloc_result,			/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,	/*tp_flags*/
+  "GDB object, representing a disassembler result",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  0,						/* tp_methods */
+  0,						/* tp_members */
+  disasm_result_object_getset,			/* tp_getset */
+  0,						/* tp_base */
+  0,						/* tp_dict */
+  0,						/* tp_descr_get */
+  0,						/* tp_descr_set */
+  0,						/* tp_dictoffset */
+  disasmpy_result_init,				/* tp_init */
+  0,						/* tp_alloc */
+};
diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
index d947b96033b..ed5894c1c3d 100644
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@@ -540,6 +540,8 @@ int gdbpy_initialize_connection ()
 int gdbpy_initialize_micommands (void)
   CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 void gdbpy_finalize_micommands ();
+int gdbpy_initialize_disasm ()
+  CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 
 /* A wrapper for PyErr_Fetch that handles reference counting for the
    caller.  */
@@ -822,4 +824,18 @@ extern bool gdbpy_is_architecture (PyObject *obj);
 
 extern bool gdbpy_is_progspace (PyObject *obj);
 
+/* Implement the 'print_insn' hook for Python.  Disassemble an instruction
+   whose address is ADDRESS for architecture GDBARCH.  The bytes of the
+   instruction should be read with INFO->read_memory_func as the
+   instruction being disassembled might actually be in a buffer.
+
+   Used INFO->fprintf_func to print the results of the disassembly, and
+   return the length of the instruction in octets.
+
+   If no instruction can be disassembled then return an empty value.  */
+
+extern gdb::optional<int> gdbpy_print_insn (struct gdbarch *gdbarch,
+					    CORE_ADDR address,
+					    disassemble_info *info);
+
 #endif /* PYTHON_PYTHON_INTERNAL_H */
diff --git a/gdb/python/python.c b/gdb/python/python.c
index b5b8379e23c..084b3687fec 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -167,7 +167,7 @@ static const struct extension_language_ops python_extension_ops =
 
   gdbpy_colorize_disasm,
 
-  NULL, /* gdbpy_print_insn, */
+  gdbpy_print_insn,
 };
 
 #endif /* HAVE_PYTHON */
@@ -2045,6 +2045,7 @@ do_start_initialization ()
 
   if (gdbpy_initialize_auto_load () < 0
       || gdbpy_initialize_values () < 0
+      || gdbpy_initialize_disasm () < 0
       || gdbpy_initialize_frames () < 0
       || gdbpy_initialize_commands () < 0
       || gdbpy_initialize_instruction () < 0
diff --git a/gdb/testsuite/gdb.python/py-disasm.c b/gdb/testsuite/gdb.python/py-disasm.c
new file mode 100644
index 00000000000..ee0bb157f4d
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.c
@@ -0,0 +1,25 @@
+/* This test program is part of GDB, the GNU debugger.
+
+   Copyright 2021-2022 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+int
+main ()
+{
+  asm ("nop");
+  asm ("nop");	/* Break here.  */
+  asm ("nop");
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.exp b/gdb/testsuite/gdb.python/py-disasm.exp
new file mode 100644
index 00000000000..432a1c61d02
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.exp
@@ -0,0 +1,202 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This file is part of the GDB testsuite.  It validates the Python
+# disassembler API.
+
+load_lib gdb-python.exp
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile} "debug"] } {
+    return -1
+}
+
+# Skip all tests if Python scripting is not enabled.
+if { [skip_python_tests] } { continue }
+
+if ![runto_main] then {
+    fail "can't run to main"
+    return 0
+}
+
+set pyfile [gdb_remote_download host ${srcdir}/${subdir}/${testfile}.py]
+
+gdb_test "source ${pyfile}" "Python script imported" \
+         "import python scripts"
+
+gdb_breakpoint [gdb_get_line_number "Break here."]
+gdb_continue_to_breakpoint "Break here."
+
+set curr_pc [get_valueof "/x" "\$pc" "*unknown*"]
+
+gdb_test_no_output "python current_pc = ${curr_pc}"
+
+# The current pc will be something like 0x1234 with no leading zeros.
+# However, in the disassembler output addresses are padded with zeros.
+# This substitution changes 0x1234 to 0x0*1234, which can then be used
+# as a regexp in the disassembler output matching.
+set curr_pc_pattern [string replace ${curr_pc} 0 1 "0x0*"]
+
+# Grab the name of the current architecture, this is used in the tests
+# patterns below.
+set curr_arch [get_python_valueof "gdb.selected_inferior().architecture().name()" "*unknown*"]
+
+# Helper proc that removes all registered disassemblers.
+proc py_remove_all_disassemblers {} {
+    gdb_test_no_output "python remove_all_python_disassemblers()"
+}
+
+# A list of test plans.  Each plan is a list of two elements, the
+# first element is the name of a class in py-disasm.py, this is a
+# disassembler class.  The second element is a pattern that should be
+# matched in the disassembler output.
+#
+# Each different disassembler tests some different feature of the
+# Python disassembler API.
+set addr_pattern "\r\n=> ${curr_pc_pattern} <\[^>\]+>:\\s+"
+set base_pattern "${addr_pattern}nop"
+set test_plans \
+    [list \
+	 [list "" "${base_pattern}\r\n.*"] \
+	 [list "GlobalNullDisassembler" "${base_pattern}\r\n.*"] \
+	 [list "GlobalPreInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalPostInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalReadDisassembler" "${base_pattern}\\s+## bytes =( $hex)+\r\n.*"] \
+	 [list "GlobalAddrDisassembler" "${base_pattern}\\s+## addr = ${curr_pc_pattern} <\[^>\]+>\r\n.*"] \
+	 [list "NonMemoryErrorEarlyDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: non-memory error instead of a result\r\nnop\r\n.*"] \
+	 [list "NonMemoryErrorLateDisassembler" "${addr_pattern}Python Exception <class 'gdb\\.GdbError'>: non-memory error after builtin disassembler\r\nnop\r\n.*"] \
+	 [list "MemoryErrorEarlyDisassembler" "${base_pattern}\\s+## AFTER ERROR\r\n.*"] \
+	 [list "MemoryErrorLateDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "RethrowMemoryErrorDisassembler" "${addr_pattern}Cannot access memory at address $hex"] \
+	 [list "FaultingMemorySourceDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "FailingMemorySourceDisassembler" "${addr_pattern}Python Exception <class 'gdb.GdbError'>: the memory source failed\r\n\r\nCannot access memory at address ${curr_pc_pattern}"] \
+	 [list "MemorySourceNotABufferDisassembler" "${addr_pattern}Python Exception <class 'TypeError'>: Result from read_memory is not a buffer\r\n\r\nCannot access memory at address ${curr_pc_pattern}"] \
+	 [list "MemorySourceBufferTooLongDisassembler" "${addr_pattern}Python Exception <class 'ValueError'>: Buffer returned from read_memory is sized $decimal instead of the expected $decimal\r\n\r\nCannot access memory at address ${curr_pc_pattern}"] \
+	 [list "ResultOfWrongType" "${addr_pattern}Python Exception <class 'TypeError'>: Result is not a DisassemblerResult.\r\n.*"] \
+	 [list "ResultWithInvalidLength" "${addr_pattern}Python Exception <class 'ValueError'>: Invalid length attribute: length must be greater than 0.\r\n.*"] \
+	 [list "ResultWithInvalidString" "${addr_pattern}Python Exception <class 'ValueError'>: String attribute must not be empty.\r\n.*"]]
+
+# Now execute each test plan.
+foreach plan $test_plans {
+    set global_disassembler_name [lindex $plan 0]
+    set expected_pattern [lindex $plan 1]
+
+    with_test_prefix "global_disassembler=${global_disassembler_name}" {
+	# Remove all existing disassemblers.
+	py_remove_all_disassemblers
+
+	# If we have a disassembler to load, do it now.
+	if { $global_disassembler_name != "" } {
+	    gdb_test_no_output "python add_global_disassembler($global_disassembler_name)"
+	}
+
+	# Disassemble main, and check the disassembler output.
+	gdb_test "disassemble main" $expected_pattern
+    }
+}
+
+# Check some errors relating to DisassemblerResult creation.
+with_test_prefix "DisassemblerResult errors" {
+    gdb_test "python gdb.disassembler.DisassemblerResult(0, 'abc')" \
+	[multi_line \
+	     "ValueError: Length must be greater than 0." \
+	     "Error while executing Python code."]
+    gdb_test "python gdb.disassembler.DisassemblerResult(-1, 'abc')" \
+	[multi_line \
+	     "ValueError: Length must be greater than 0." \
+	     "Error while executing Python code."]
+    gdb_test "python gdb.disassembler.DisassemblerResult(1, '')" \
+	[multi_line \
+	     "ValueError: String must not be empty." \
+	     "Error while executing Python code."]
+}
+
+# Check that the architecture specific disassemblers can override the
+# global disassembler.
+#
+# First, register a global disassembler, and check it is in place.
+with_test_prefix "GLOBAL tagging disassembler" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"GLOBAL\"), None)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Now register an architecture specific disassembler, and check it
+# overrides the global disassembler.
+with_test_prefix "LOCAL tagging disassembler" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"LOCAL\"), \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = LOCAL\r\n.*"
+}
+
+# Now remove the architecture specific disassembler, and check that
+# the global disassembler kicks back in.
+with_test_prefix "GLOBAL tagging disassembler again" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(None, \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Check that a DisassembleInfo becomes invalid after the call into the
+# disassembler.
+with_test_prefix "DisassembleInfo becomes invalid" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python add_global_disassembler(GlobalCachingDisassembler)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## CACHED\r\n.*"
+    gdb_test "python GlobalCachingDisassembler.check()" "PASS"
+}
+
+# Test the memory source aspect of the builtin disassembler.
+with_test_prefix "memory source api" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python analyzing_disassembler = add_global_disassembler(AnalyzingDisassembler)"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*"
+    gdb_test "python analyzing_disassembler.find_replacement_candidate()" \
+	"Replace from $hex to $hex with NOP"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*" \
+	"second disassembler pass"
+    gdb_test "python analyzing_disassembler.check()" \
+	"PASS"
+}
+
+# Test the 'maint info python-disassemblers command.
+with_test_prefix "maint info python-disassemblers" {
+    py_remove_all_disassemblers
+    gdb_test "maint info python-disassemblers" "No Python disassemblers registered\\." \
+	"list disassemblers, none registered"
+    gdb_test_no_output "python disasm = add_global_disassembler(BuiltinDisassembler)"
+    gdb_test "maint info python-disassemblers" \
+	[multi_line \
+	     "Architecture\\s+Disassember Name" \
+	     "GLOBAL\\s+BuiltinDisassembler\\s+\\(Matches current architecture\\)"] \
+	"list disassemblers, single global disassembler"
+    gdb_test_no_output "python arch = gdb.selected_inferior().architecture().name()"
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(disasm, arch)"
+    gdb_test "maint info python-disassemblers" \
+	[multi_line \
+	     "Architecture\\s+Disassember Name" \
+	     "\[^\r\n\]+BuiltinDisassembler\\s+\\(Matches current architecture\\)" \
+	     "GLOBAL\\s+BuiltinDisassembler"] \
+	"list disassemblers, multiple disassemblers registered"
+}
+
+# Check the attempt to create a "new" DisassembleInfo object fails.
+with_test_prefix "Bad DisassembleInfo creation" {
+    gdb_test_no_output "python my_info = InvalidDisassembleInfo()"
+    gdb_test "python print(my_info.is_valid())" "True"
+    gdb_test "python gdb.disassembler.builtin_disassemble(my_info)" \
+	[multi_line \
+	     "RuntimeError: DisassembleInfo is no longer valid\\." \
+	     "Error while executing Python code\\."]
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.py b/gdb/testsuite/gdb.python/py-disasm.py
new file mode 100644
index 00000000000..62925ce8c06
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.py
@@ -0,0 +1,614 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import gdb
+import gdb.disassembler
+import struct
+import sys
+
+from gdb.disassembler import Disassembler, DisassemblerResult
+
+# A global, holds the program-counter address at which we should
+# perform the extra disassembly that this script provides.
+current_pc = None
+
+
+# Remove all currently registered disassemblers.
+def remove_all_python_disassemblers():
+    for a in gdb.architecture_names():
+        gdb.disassembler.register_disassembler(None, a)
+    gdb.disassembler.register_disassembler(None, None)
+
+
+class TestDisassembler(Disassembler):
+    """A base class for disassemblers within this script to inherit from.
+    Implements the __call__ method and ensures we only do any
+    disassembly wrapping for the global CURRENT_PC."""
+
+    def __init__(self):
+        global current_pc
+
+        super().__init__("TestDisassembler")
+        self.__info = None
+        if current_pc == None:
+            raise gdb.GdbError("no current_pc set")
+
+    def __call__(self, info):
+        global current_pc
+
+        if info.address != current_pc:
+            return None
+        self.__info = info
+        return self.disassemble(info)
+
+    def get_info(self):
+        return self.__info
+
+    def disassemble(self, info):
+        raise NotImplementedError("override the disassemble method")
+
+
+class GlobalPreInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo before disassembly has occurred."""
+
+    def disassemble(self, info):
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalPostInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo after disassembly has occurred."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalReadDisassembler(TestDisassembler):
+    """Check the DisassembleInfo.read_memory method.  Calls the builtin
+    disassembler, then reads all of the bytes of this instruction, and
+    adds them as a comment to the disassembler output."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        len = result.length
+        str = ""
+        for o in range(len):
+            if str != "":
+                str += " "
+            v = bytes(info.read_memory(1, o))[0]
+            if sys.version_info[0] < 3:
+                v = struct.unpack("<B", v)
+            str += "0x%02x" % v
+        text = result.string + "\t## bytes = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalAddrDisassembler(TestDisassembler):
+    """Check the gdb.format_address method."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        arch = info.architecture
+        addr = info.address
+        program_space = info.progspace
+        str = gdb.format_address(addr, program_space, arch)
+        text = result.string + "\t## addr = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class NonMemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a non-memory error instead of performing any disassembly."""
+
+    def disassemble(self, info):
+        raise gdb.GdbError("non-memory error instead of a result")
+
+
+class NonMemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a non-memory error after calling the builtin disassembler."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        raise gdb.GdbError("non-memory error after builtin disassembler")
+
+
+class MemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a memory error, ignore the error and disassemble."""
+
+    def disassemble(self, info):
+        tag = "## FAIL"
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError:
+            tag = "## AFTER ERROR"
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t" + tag
+        return DisassemblerResult(result.length, text)
+
+
+class MemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a memory error after calling the builtin disassembler, but
+    before we return a result."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        # The following read will throw an error.
+        info.read_memory(1, -info.address + 2)
+        return DisassemblerResult(1, "BAD")
+
+
+class RethrowMemoryErrorDisassembler(TestDisassembler):
+    """Catch and rethrow a memory error."""
+
+    def disassemble(self, info):
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError as e:
+            raise gdb.MemoryError("cannot read code at address 0x2")
+        return DisassemblerResult(1, "BAD")
+
+
+class ResultOfWrongType(TestDisassembler):
+    """Return something that is not a DisassemblerResult from disassemble method"""
+
+    class Blah:
+        def __init__(self, length, string):
+            self.length = length
+            self.string = string
+
+    def disassemble(self, info):
+        return self.Blah(1, "ABC")
+
+
+class ResultWrapper(gdb.disassembler.DisassemblerResult):
+    def __init__(self, length, string, length_x=None, string_x=None):
+        super().__init__(length, string)
+        if length_x is None:
+            self.__length = length
+        else:
+            self.__length = length_x
+        if string_x is None:
+            self.__string = string
+        else:
+            self.__string = string_x
+
+    @property
+    def length(self):
+        return self.__length
+
+    @property
+    def string(self):
+        return self.__string
+
+
+class ResultWithInvalidLength(TestDisassembler):
+    """Return a result object with an invalid length."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        return ResultWrapper(result.length, result.string, 0)
+
+
+class ResultWithInvalidString(TestDisassembler):
+    """Return a result object with an empty string."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        return ResultWrapper(result.length, result.string, None, "")
+
+
+class TaggingDisassembler(TestDisassembler):
+    """A simple disassembler that just tags the output."""
+
+    def __init__(self, tag):
+        super().__init__()
+        self._tag = tag
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## tag = %s" % self._tag
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalCachingDisassembler(TestDisassembler):
+    """A disassembler that caches the DisassembleInfo that is passed in,
+    as well as a copy of the original DisassembleInfo.
+
+    Once the call into the disassembler is complete then the
+    DisassembleInfo objects become invalid, and any calls into them
+    should trigger an exception."""
+
+    # This is where we cache the DisassembleInfo objects.
+    cached_insn_disas = []
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+    def disassemble(self, info):
+        """Disassemble the instruction, add a CACHED comment to the output,
+        and cache the DisassembleInfo so that it is not garbage collected."""
+        GlobalCachingDisassembler.cached_insn_disas.append (info)
+        GlobalCachingDisassembler.cached_insn_disas.append (self.MyInfo(info))
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## CACHED"
+        return DisassemblerResult(result.length, text)
+
+    @staticmethod
+    def check():
+        """Check that all of the methods on the cached DisassembleInfo trigger an
+        exception."""
+        for info in GlobalCachingDisassembler.cached_insn_disas:
+            assert isinstance(info, gdb.disassembler.DisassembleInfo)
+            assert not info.is_valid()
+            try:
+                val = info.address
+                raise gdb.GdbError("DisassembleInfo.address is still valid")
+            except RuntimeError as e:
+                assert str(e) == "DisassembleInfo is no longer valid."
+            except:
+                raise gdb.GdbError("DisassembleInfo.address raised an unexpected exception")
+
+            try:
+                val = info.architecture
+                raise gdb.GdbError("DisassembleInfo.architecture is still valid")
+            except RuntimeError as e:
+                assert str(e) == "DisassembleInfo is no longer valid."
+            except:
+                raise gdb.GdbError(
+                    "DisassembleInfo.architecture raised an unexpected exception"
+                )
+
+            try:
+                val = info.read_memory(1, 0)
+                raise gdb.GdbError("DisassembleInfo.read is still valid")
+            except RuntimeError as e:
+                assert str(e) == "DisassembleInfo is no longer valid."
+            except:
+                raise gdb.GdbError("DisassembleInfo.read raised an unexpected exception")
+
+        print("PASS")
+
+
+class GlobalNullDisassembler(TestDisassembler):
+    """A disassembler that does not change the output at all."""
+
+    def disassemble(self, info):
+        pass
+
+
+class FaultingMemorySourceDisassembler(TestDisassembler):
+    """Throw a memory error from the memory source read_memory method."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            # Throw a memory error with a specific address.  We don't
+            # expect this address to show up in the output though.
+            raise gdb.MemoryError(0x1234)
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class FailingMemorySourceDisassembler(TestDisassembler):
+    """Throw a non-memory error from the memory source read_memory method."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            raise gdb.GdbError("the memory source failed")
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class MemorySourceNotABufferDisassembler(TestDisassembler):
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            return 1234
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class MemorySourceBufferTooLongDisassembler(TestDisassembler):
+    """The read memory returns too many bytes."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            buffer = super().read_memory(length, offset)
+            # Create a new memory view made by duplicating BUFFER.  This
+            # will trigger an error as GDB expects a buffer of exactly
+            # LENGTH to be returned, while this will return a buffer of
+            # 2*LENGTH.
+            return memoryview(
+                bytes([int.from_bytes(x, "little") for x in (list(buffer[0:]) * 2)])
+            )
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class BuiltinDisassembler(Disassembler):
+    """Just calls the builtin disassembler."""
+
+    def __init__(self):
+        super().__init__("BuiltinDisassembler")
+
+    def __call__(self, info):
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class AnalyzingDisassembler(Disassembler):
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        """Wrapper around builtin DisassembleInfo type that overrides the
+        read_memory method."""
+
+        def __init__(self, info, start, end, nop_bytes):
+            """INFO is the DisassembleInfo we are wrapping.  START and END are
+            addresses, and NOP_BYTES should be a memoryview object.
+
+            The length (END - START) should be the same as the length
+            of NOP_BYTES.
+
+            Any memory read requests outside the START->END range are
+            serviced normally, but any attempt to read within the
+            START->END range will return content from NOP_BYTES."""
+            super().__init__(info)
+            self._start = start
+            self._end = end
+            self._nop_bytes = nop_bytes
+
+        def _read_replacement(self, length, offset):
+            """Return a slice of the buffer representing the replacement nop
+            instructions."""
+
+            assert self._nop_bytes is not None
+            rb = self._nop_bytes
+
+            # If this request is outside of a nop instruction then we don't know
+            # what to do, so just raise a memory error.
+            if offset >= len(rb) or (offset + length) > len(rb):
+                raise gdb.MemoryError("invalid length and offset combination")
+
+            # Return only the slice of the nop instruction as requested.
+            s = offset
+            e = offset + length
+            return rb[s:e]
+
+        def read_memory(self, length, offset=0):
+            """Callback used by the builtin disassembler to read the contents of
+            memory."""
+
+            # If this request is within the region we are replacing with 'nop'
+            # instructions, then call the helper function to perform that
+            # replacement.
+            if self._start is not None:
+                assert self._end is not None
+                if self.address >= self._start and self.address < self._end:
+                    return self._read_replacement(length, offset)
+
+            # Otherwise, we just forward this request to the default read memory
+            # implementation.
+            return super().read_memory(length, offset)
+
+    def __init__(self):
+        """Constructor."""
+        super().__init__("AnalyzingDisassembler")
+
+        # Details about the instructions found during the first disassembler
+        # pass.
+        self._pass_1_length = []
+        self._pass_1_insn = []
+        self._pass_1_address = []
+
+        # The start and end address for the instruction we will replace with
+        # one or more 'nop' instructions during pass two.
+        self._start = None
+        self._end = None
+
+        # The index in the _pass_1_* lists for where the nop instruction can
+        # be found, also, the buffer of bytes that make up a nop instruction.
+        self._nop_index = None
+        self._nop_bytes = None
+
+        # A flag that indicates if we are in the first or second pass of
+        # this disassembler test.
+        self._first_pass = True
+
+        # The disassembled instructions collected during the second pass.
+        self._pass_2_insn = []
+
+        # A copy of _pass_1_insn that has been modified to include the extra
+        # 'nop' instructions we plan to insert during the second pass.  This
+        # is then checked against _pass_2_insn after the second disassembler
+        # pass has completed.
+        self._check = []
+
+    def __call__(self, info):
+        """Called to perform the disassembly."""
+
+        # Override the info object, this provides access to our
+        # read_memory function.
+        info = self.MyInfo(info, self._start, self._end, self._nop_bytes)
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        # Record some informaiton about the first 'nop' instruction we find.
+        if self._nop_index is None and result.string == "nop":
+            self._nop_index = len(self._pass_1_length)
+            # The offset in the following read_memory call defaults to 0.
+            print("APB: Reading nop bytes")
+            self._nop_bytes = info.read_memory(result.length)
+
+        # Record information about each instruction that is disassembled.
+        # This test is performed in two passes, and we need different
+        # information in each pass.
+        if self._first_pass:
+            self._pass_1_length.append(result.length)
+            self._pass_1_insn.append(result.string)
+            self._pass_1_address.append(info.address)
+        else:
+            self._pass_2_insn.append(result.string)
+
+        return result
+
+    def find_replacement_candidate(self):
+        """Call this after the first disassembly pass.  This identifies a suitable
+        instruction to replace with 'nop' instruction(s)."""
+
+        if self._nop_index is None:
+            raise gdb.GdbError("no nop was found")
+
+        nop_idx = self._nop_index
+        nop_length = self._pass_1_length[nop_idx]
+
+        # First we look for an instruction that is larger than a nop
+        # instruction, but whose length is an exact multiple of the nop
+        # instruction's length.
+        replace_idx = None
+        for idx in range(len(self._pass_1_length)):
+            if (
+                idx > 0
+                and idx != nop_idx
+                and self._pass_1_insn[idx] != "nop"
+                and self._pass_1_length[idx] > self._pass_1_length[nop_idx]
+                and self._pass_1_length[idx] % self._pass_1_length[nop_idx] == 0
+            ):
+                replace_idx = idx
+                break
+
+        # If we still don't have a replacement candidate, then search again,
+        # this time looking for an instruciton that is the same length as a
+        # nop instruction.
+        if replace_idx is None:
+            for idx in range(len(self._pass_1_length)):
+                if (
+                    idx > 0
+                    and idx != nop_idx
+                    and self._pass_1_insn[idx] != "nop"
+                    and self._pass_1_length[idx] == self._pass_1_length[nop_idx]
+                ):
+                    replace_idx = idx
+                    break
+
+        # Weird, the nop instruction must be larger than every other
+        # instruction, or all instructions are 'nop'?
+        if replace_idx is None:
+            raise gdb.GdbError("can't find an instruction to replace")
+
+        # Record the instruction range that will be replaced with 'nop'
+        # instructions, and mark that we are now on the second pass.
+        self._start = self._pass_1_address[replace_idx]
+        self._end = self._pass_1_address[replace_idx] + self._pass_1_length[replace_idx]
+        self._first_pass = False
+        print("Replace from 0x%x to 0x%x with NOP" % (self._start, self._end))
+
+        # Finally, build the expected result.  Create the _check list, which
+        # is a copy of _pass_1_insn, but replace the instruction we
+        # identified above with a series of 'nop' instructions.
+        self._check = list(self._pass_1_insn)
+        nop_count = int(self._pass_1_length[replace_idx] / self._pass_1_length[nop_idx])
+        nops = ["nop"] * nop_count
+        self._check[replace_idx : (replace_idx + 1)] = nops
+
+    def check(self):
+        """Call this after the second disassembler pass to validate the output."""
+        if self._check != self._pass_2_insn:
+            print("APB, Check : %s" % self._check)
+            print("APB, Result: %s" % self._pass_2_insn)
+            raise gdb.GdbError("mismatch")
+        print("PASS")
+
+
+def add_global_disassembler(dis_class):
+    """Create an instance of DIS_CLASS and register it as a global disassembler."""
+    dis = dis_class()
+    gdb.disassembler.register_disassembler(dis, None)
+    return dis
+
+
+class InvalidDisassembleInfo(gdb.disassembler.DisassembleInfo):
+    """An attempt to create a DisassembleInfo sub-class without calling
+    the parent class init method.
+
+    Attempts to use instances of this class should throw an error
+    saying that the DisassembleInfo is not valid, despite this class
+    having all of the required attributes.
+
+    The reason why this class will never be valid is that an internal
+    field (within the C++ code) can't be initialized without calling
+    the parent class init method."""
+
+    def __init__(self):
+        assert current_pc is not None
+
+    def is_valid(self):
+        return True
+
+    @property
+    def address(self):
+        global current_pc
+        return current_pc
+
+    @property
+    def architecture(self):
+        return gdb.selected_inferior().architecture()
+
+    @property
+    def progspace(self):
+        return gdb.selected_inferior().progspace
+
+
+# Start with all disassemblers removed.
+remove_all_python_disassemblers()
+
+print("Python script imported")


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv5 3/5] gdb/python: implement the print_insn extension language hook
  2022-05-23  8:59                 ` Andrew Burgess
@ 2022-05-23 11:23                   ` Eli Zaretskii
  0 siblings, 0 replies; 80+ messages in thread
From: Eli Zaretskii @ 2022-05-23 11:23 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches

> From: Andrew Burgess <aburgess@redhat.com>
> Cc: gdb-patches@sourceware.org
> Date: Mon, 23 May 2022 09:59:05 +0100
> 
> Eli Zaretskii via Gdb-patches <gdb-patches@sourceware.org> writes:
> 
> >> From: Andrew Burgess <aburgess@redhat.com>
> >> Cc: gdb-patches@sourceware.org
> >> Date: Wed, 18 May 2022 11:08:50 +0100
> >> 
> >> Thanks for all your reviews of this patch series.  Really appreciate all
> >> you work.
> >> 
> >> I've updated the patch to address all your feedback.
> >
> > Thanks.  Two last nits:
> >
> >> +@node Disassembly In Python
> >> +@cindex python instruction disassembly
> >> +@subsubsection Instruction Disassembly In Python
> >
> > @cindex should be after @subsubsection, since you are indexing the
> > body of the subsection, not the title.
> >
> >> +If @var{disassembler} is @code{None} then any disassembler currently
> >> +registered for @var{architecture} is removed, the previously
> >> +registered disassembler is still returned.
> >
> > I think this should rephrased to say
> >
> >   ... then any disassembler currently registered for
> >   @var{architecture} is deregistered and returned.
> >
> > because "removed" is ambiguous in this context, and also "previously
> > registered" could be interpreted as meaning the disassembler that was
> > registered _before_ the one you are removing.
> 
> I made the changes you suggested.  The revised patch is below.

Thanks, LGTM.

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv4 3/5] gdb/python: implement the print_insn extension language hook
  2022-05-05 18:17           ` Andrew Burgess
@ 2022-05-24  1:16             ` Simon Marchi
  2022-05-24  8:30               ` Andrew Burgess
  0 siblings, 1 reply; 80+ messages in thread
From: Simon Marchi @ 2022-05-24  1:16 UTC (permalink / raw)
  To: Andrew Burgess, gdb-patches; +Cc: Andrew Burgess

Sorry for the late reply, I'm a bit busy with work and personal life.
I gave some answers below.  I don't think I'll have time to do a
thorough review of your v5.  But I already raised any concerns I had,
and I trust you for the rest.

>> Doesn't really matter, but this could probably modify the string in
>> the existing DisassemblerResult object, and then return it:
>>
>>   result.string += "\t## Comment"
>>   return result
>>
>> But I'm fine with what you have, if you think it's clearer for an
>> example.
> 
> The problem is all the properties of DisassemblerResult are read-only.
> Given it's pretty light-weight I didn't really see any problem just
> creating a new one.
> 
> I suspect that I might end up changing that in the future, but for now I
> don't see any great need to allow for modifications right now.  I figure
> extending the API to allow modifications in the future is fine if/when
> that becomes critical.
> 
> Let me know if that's going to be a problem and I can get the setting
> code added now.

Ack, this is reasonable.

>>> +disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
>>> +{
>>> +  PyObject *info_obj, *memory_source_obj = nullptr;
>>> +  static const char *keywords[] = { "info", "memory_source", nullptr };
>>> +  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
>>> +					&disasm_info_object_type, &info_obj,
>>> +					&memory_source_obj))
>>
>> I'm wondering, why is there a separate memory_source parameter when info
>> already has a read_memory method, that could potentially be overriden by
>> the user?
> 
> Here's how the API would be used right now:
> 
>   class MyDisassembler(Disassembler):
>     def __init__(self, name):
>       super().__init__(name)
>       self.__info = None
> 
>     def __call__(self, info):
>       self.__info = info
>       result = gdb.disassembler.builtin_disassemble(info, self)
> 
>     def read_memory(self, length, offset):
>       return self.__info.read_memory(length, offset)
> 
> This is obviosly pretty pointless, the memory source just calls the
> standard read_memory routine so you'll get the same behaviour as if no
> memory source was passed at all, but it shows how the API works.
> 
> If we wanted to override the DisassembleInfo.read_memory routine we'd
> do something like this:
> 
>   class MyInfo(DisassembleInfo):
>     def __init__(self,old_info):
>       super().__init__(old_info)
> 
>     def read_memory(self, length, offset):
>       return super().read_memory(length, offset)
> 
>   class MyDisassembler(Disassembler):
>     def __init__(self, name):
>       super().__init__(name)
> 
>     def __call__(self, info):
>       wrapped_info = MyInfo(info)
>       result = gdb.disassembler.builtin_disassemble(wrapped_info)
> 
> What are your thoughts on that?  I think that would be pretty easy to
> implement if you feel its an improvement.

It's been a bit too long since I've looked at this patch so I've lost
context.  I only remember thinking that instead of passing a
memory_source, you could simply pass an info with a special read_memory
implementation that reads from whatever you want.  So I think
essentially what you did in that last example.  The only remember I
would mention this is that it seems better to me to not have two ways of
doing the same thing, as it's sometimes ambiguous what happens when they
are both used, how they interact together.

>> Is there a use case for other error kinds?  For example, if the
>> disassembler finds that the machine code does not encode a valid
>> instruction, what should it do?
>>
>> Maybe this is more a question for the gdb.disassembler.Disassembler
>> implementation side of the API.
> 
> No!  Like the comment says, everything should disassemble to something,
> even if it's just ".byte xxxx".  The libopcodes disassembler only
> supports reporting one type of error, that's memory error.  It's just
> unfortunate libopcodes also uses the return value to indicate that an
> error occurred, and in some cases disassemblers return -1 (to indicate
> error) without setting a memory error.  In these cases the disassembler
> has probably written to the output stream what went wrong, but this
> really is not how libopcodes is supposed to work.
> 
> So, no.  We either have a memory error, or an unknown error.  Ideally,
> given enough time, libopcodes will be fixed so that we _only_ ever emit
> memory errors.

Ok.  Well, for an API that allows users to arbitrarily extend
functionality, it makes sense to always allow returning a generic error
of some kind.  Perhaps libopcodes can't fail, but perhaps (a bit
stretched example to illustrate) my disassembler makes a GET request to
an HTTP server to do the actual disassembling.  So it seems good to me
to always have a "something went wrong" escape hatch.

>> This makes me think, is there a way for a Python user to call into the
>> disassembler?  Should the DisassembleInfo object have a user-callable
>> constructor, should the user want to construct one?
>>
>> I could imagine you could do this out of nowhere:
>>
>>   gdb.disassembler.builtin_disassemble(DisassembleInfo(addr, arch, progspace))
> 
> No! Don't do that.
> 
> We already have gdb.Architecture.disassemble which provides access to
> the disassembler.  You might feel that method is misplaced on
> Architecture (but that wasn't me!), but it is what it is.
> 
> I think if you are writing some random piece of Python code then you
> should not be worrying about Python disassemblers vs builtin
> disassembler; you should just call gdb.Architecture.disassemble and let
> GDB invoke the "correct" disassembler for you.
> 
> Preventing direct calls to gdb.disassembler.builtin_disassemble is one
> of the main reasons that I deliberately don't provide a user callable
> constructor for DisassembleInfo, during development I did have that
> method at one point, and removed it precisely to prevent the above! 

Ack.

>> But that would skip the Python disassemblers, so a user could also want
>> to call this function that doesn't exist today:
>>
>>   gdb.disassemble(DisassembleInfo(addr, arch, progspace))
> 
> I think having:
> 
>   gdb.disassemble(start_address, end_address, architecture, program_space)
> 
> would be better than the current disassemble method on Architecture.
> Thinking about what that does I suspect that I might end up having to
> work on Architecture.disassemble at some point in the future, so I might
> add a top-level gdb.disassemble and make the existing architecture
> method forward to that one.  We'll see.

I think that would make sense.

>> Since it's kind of expected that _print_insn is there, should this be a
>> gdb_assert?  Just returning silently here makes it more difficult to
>> investigate problems, IMO.  The only reason for the assert to trigger
>> would be if someone messed with the GDB Python modules, which I think is
>> ok.
> 
> I've not gone with an assert, but I did rewrite this code so now the
> user will get an error if _print_insn is not present.  I did that by
> removing the HasAttrString check here, and then...
> 
>>
>>> +
>>> +  /* Now grab the callback attribute from the module.  */
>>> +  gdbpy_ref<> hook
>>> +    (PyObject_GetAttrString (gdb_python_disassembler_module.get (),
>>> +			     callback_name));
>>> +  if (hook == nullptr)
>>
>> This can't be true, since you already checked with
>> PyObject_HasAttrString.
> 
> ... this check is now useful, the GetAttrString will fail if _print_insn
> is not present, and the PyErr will be set.

Ok.

Simon

^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv4 3/5] gdb/python: implement the print_insn extension language hook
  2022-05-24  1:16             ` Simon Marchi
@ 2022-05-24  8:30               ` Andrew Burgess
  2022-05-25 10:37                 ` Andrew Burgess
  0 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2022-05-24  8:30 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches; +Cc: Andrew Burgess

Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> writes:

> Sorry for the late reply, I'm a bit busy with work and personal life.
> I gave some answers below.  I don't think I'll have time to do a
> thorough review of your v5.  But I already raised any concerns I had,
> and I trust you for the rest.

Understood.  Thanks for all the time you've already put into reviewing
this series.

>
>>> Doesn't really matter, but this could probably modify the string in
>>> the existing DisassemblerResult object, and then return it:
>>>
>>>   result.string += "\t## Comment"
>>>   return result
>>>
>>> But I'm fine with what you have, if you think it's clearer for an
>>> example.
>> 
>> The problem is all the properties of DisassemblerResult are read-only.
>> Given it's pretty light-weight I didn't really see any problem just
>> creating a new one.
>> 
>> I suspect that I might end up changing that in the future, but for now I
>> don't see any great need to allow for modifications right now.  I figure
>> extending the API to allow modifications in the future is fine if/when
>> that becomes critical.
>> 
>> Let me know if that's going to be a problem and I can get the setting
>> code added now.
>
> Ack, this is reasonable.
>
>>>> +disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
>>>> +{
>>>> +  PyObject *info_obj, *memory_source_obj = nullptr;
>>>> +  static const char *keywords[] = { "info", "memory_source", nullptr };
>>>> +  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
>>>> +					&disasm_info_object_type, &info_obj,
>>>> +					&memory_source_obj))
>>>
>>> I'm wondering, why is there a separate memory_source parameter when info
>>> already has a read_memory method, that could potentially be overriden by
>>> the user?
>> 
>> Here's how the API would be used right now:
>> 
>>   class MyDisassembler(Disassembler):
>>     def __init__(self, name):
>>       super().__init__(name)
>>       self.__info = None
>> 
>>     def __call__(self, info):
>>       self.__info = info
>>       result = gdb.disassembler.builtin_disassemble(info, self)
>> 
>>     def read_memory(self, length, offset):
>>       return self.__info.read_memory(length, offset)
>> 
>> This is obviosly pretty pointless, the memory source just calls the
>> standard read_memory routine so you'll get the same behaviour as if no
>> memory source was passed at all, but it shows how the API works.
>> 
>> If we wanted to override the DisassembleInfo.read_memory routine we'd
>> do something like this:
>> 
>>   class MyInfo(DisassembleInfo):
>>     def __init__(self,old_info):
>>       super().__init__(old_info)
>> 
>>     def read_memory(self, length, offset):
>>       return super().read_memory(length, offset)
>> 
>>   class MyDisassembler(Disassembler):
>>     def __init__(self, name):
>>       super().__init__(name)
>> 
>>     def __call__(self, info):
>>       wrapped_info = MyInfo(info)
>>       result = gdb.disassembler.builtin_disassemble(wrapped_info)
>> 
>> What are your thoughts on that?  I think that would be pretty easy to
>> implement if you feel its an improvement.
>
> It's been a bit too long since I've looked at this patch so I've lost
> context.  I only remember thinking that instead of passing a
> memory_source, you could simply pass an info with a special read_memory
> implementation that reads from whatever you want.  So I think
> essentially what you did in that last example.  The only remember I
> would mention this is that it seems better to me to not have two ways of
> doing the same thing, as it's sometimes ambiguous what happens when they
> are both used, how they interact together.

I think what happened is when I started writing the above I was planning
to argue that my "memory_source" approach was best, but as I sketched
out the sub-classing DisassembleInfo approach I realised that your idea
probably was better.

So I updated my patch, and then failed to reword the above text.

I think we're all in alignment now, the way to intercept memory writes
is to sub-class DisassembleInfo (like the second example above), so I
think this issue is resolved.

>
>>> Is there a use case for other error kinds?  For example, if the
>>> disassembler finds that the machine code does not encode a valid
>>> instruction, what should it do?
>>>
>>> Maybe this is more a question for the gdb.disassembler.Disassembler
>>> implementation side of the API.
>> 
>> No!  Like the comment says, everything should disassemble to something,
>> even if it's just ".byte xxxx".  The libopcodes disassembler only
>> supports reporting one type of error, that's memory error.  It's just
>> unfortunate libopcodes also uses the return value to indicate that an
>> error occurred, and in some cases disassemblers return -1 (to indicate
>> error) without setting a memory error.  In these cases the disassembler
>> has probably written to the output stream what went wrong, but this
>> really is not how libopcodes is supposed to work.
>> 
>> So, no.  We either have a memory error, or an unknown error.  Ideally,
>> given enough time, libopcodes will be fixed so that we _only_ ever emit
>> memory errors.
>
> Ok.  Well, for an API that allows users to arbitrarily extend
> functionality, it makes sense to always allow returning a generic error
> of some kind.  Perhaps libopcodes can't fail, but perhaps (a bit
> stretched example to illustrate) my disassembler makes a GET request to
> an HTTP server to do the actual disassembling.  So it seems good to me
> to always have a "something went wrong" escape hatch.

I'll take another pass as this aspect and check how throwing arbitrary
errors from Python code is presented to the user.

>
>>> This makes me think, is there a way for a Python user to call into the
>>> disassembler?  Should the DisassembleInfo object have a user-callable
>>> constructor, should the user want to construct one?
>>>
>>> I could imagine you could do this out of nowhere:
>>>
>>>   gdb.disassembler.builtin_disassemble(DisassembleInfo(addr, arch, progspace))
>> 
>> No! Don't do that.
>> 
>> We already have gdb.Architecture.disassemble which provides access to
>> the disassembler.  You might feel that method is misplaced on
>> Architecture (but that wasn't me!), but it is what it is.
>> 
>> I think if you are writing some random piece of Python code then you
>> should not be worrying about Python disassemblers vs builtin
>> disassembler; you should just call gdb.Architecture.disassemble and let
>> GDB invoke the "correct" disassembler for you.
>> 
>> Preventing direct calls to gdb.disassembler.builtin_disassemble is one
>> of the main reasons that I deliberately don't provide a user callable
>> constructor for DisassembleInfo, during development I did have that
>> method at one point, and removed it precisely to prevent the above! 
>
> Ack.
>
>>> But that would skip the Python disassemblers, so a user could also want
>>> to call this function that doesn't exist today:
>>>
>>>   gdb.disassemble(DisassembleInfo(addr, arch, progspace))
>> 
>> I think having:
>> 
>>   gdb.disassemble(start_address, end_address, architecture, program_space)
>> 
>> would be better than the current disassemble method on Architecture.
>> Thinking about what that does I suspect that I might end up having to
>> work on Architecture.disassemble at some point in the future, so I might
>> add a top-level gdb.disassemble and make the existing architecture
>> method forward to that one.  We'll see.
>
> I think that would make sense.
>
>>> Since it's kind of expected that _print_insn is there, should this be a
>>> gdb_assert?  Just returning silently here makes it more difficult to
>>> investigate problems, IMO.  The only reason for the assert to trigger
>>> would be if someone messed with the GDB Python modules, which I think is
>>> ok.
>> 
>> I've not gone with an assert, but I did rewrite this code so now the
>> user will get an error if _print_insn is not present.  I did that by
>> removing the HasAttrString check here, and then...
>> 
>>>
>>>> +
>>>> +  /* Now grab the callback attribute from the module.  */
>>>> +  gdbpy_ref<> hook
>>>> +    (PyObject_GetAttrString (gdb_python_disassembler_module.get (),
>>>> +			     callback_name));
>>>> +  if (hook == nullptr)
>>>
>>> This can't be true, since you already checked with
>>> PyObject_HasAttrString.
>> 
>> ... this check is now useful, the GetAttrString will fail if _print_insn
>> is not present, and the PyErr will be set.
>
> Ok.

I'll look at the remaining issue (handling of exceptions other than
MemoryError) and possibly post a final update.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv4 3/5] gdb/python: implement the print_insn extension language hook
  2022-05-24  8:30               ` Andrew Burgess
@ 2022-05-25 10:37                 ` Andrew Burgess
  0 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-05-25 10:37 UTC (permalink / raw)
  To: Simon Marchi, gdb-patches; +Cc: Andrew Burgess

Andrew Burgess <aburgess@redhat.com> writes:

> Simon Marchi via Gdb-patches <gdb-patches@sourceware.org> writes:
>
>> Sorry for the late reply, I'm a bit busy with work and personal life.
>> I gave some answers below.  I don't think I'll have time to do a
>> thorough review of your v5.  But I already raised any concerns I had,
>> and I trust you for the rest.
>
> Understood.  Thanks for all the time you've already put into reviewing
> this series.
>
>>
>>>> Doesn't really matter, but this could probably modify the string in
>>>> the existing DisassemblerResult object, and then return it:
>>>>
>>>>   result.string += "\t## Comment"
>>>>   return result
>>>>
>>>> But I'm fine with what you have, if you think it's clearer for an
>>>> example.
>>> 
>>> The problem is all the properties of DisassemblerResult are read-only.
>>> Given it's pretty light-weight I didn't really see any problem just
>>> creating a new one.
>>> 
>>> I suspect that I might end up changing that in the future, but for now I
>>> don't see any great need to allow for modifications right now.  I figure
>>> extending the API to allow modifications in the future is fine if/when
>>> that becomes critical.
>>> 
>>> Let me know if that's going to be a problem and I can get the setting
>>> code added now.
>>
>> Ack, this is reasonable.
>>
>>>>> +disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
>>>>> +{
>>>>> +  PyObject *info_obj, *memory_source_obj = nullptr;
>>>>> +  static const char *keywords[] = { "info", "memory_source", nullptr };
>>>>> +  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
>>>>> +					&disasm_info_object_type, &info_obj,
>>>>> +					&memory_source_obj))
>>>>
>>>> I'm wondering, why is there a separate memory_source parameter when info
>>>> already has a read_memory method, that could potentially be overriden by
>>>> the user?
>>> 
>>> Here's how the API would be used right now:
>>> 
>>>   class MyDisassembler(Disassembler):
>>>     def __init__(self, name):
>>>       super().__init__(name)
>>>       self.__info = None
>>> 
>>>     def __call__(self, info):
>>>       self.__info = info
>>>       result = gdb.disassembler.builtin_disassemble(info, self)
>>> 
>>>     def read_memory(self, length, offset):
>>>       return self.__info.read_memory(length, offset)
>>> 
>>> This is obviosly pretty pointless, the memory source just calls the
>>> standard read_memory routine so you'll get the same behaviour as if no
>>> memory source was passed at all, but it shows how the API works.
>>> 
>>> If we wanted to override the DisassembleInfo.read_memory routine we'd
>>> do something like this:
>>> 
>>>   class MyInfo(DisassembleInfo):
>>>     def __init__(self,old_info):
>>>       super().__init__(old_info)
>>> 
>>>     def read_memory(self, length, offset):
>>>       return super().read_memory(length, offset)
>>> 
>>>   class MyDisassembler(Disassembler):
>>>     def __init__(self, name):
>>>       super().__init__(name)
>>> 
>>>     def __call__(self, info):
>>>       wrapped_info = MyInfo(info)
>>>       result = gdb.disassembler.builtin_disassemble(wrapped_info)
>>> 
>>> What are your thoughts on that?  I think that would be pretty easy to
>>> implement if you feel its an improvement.
>>
>> It's been a bit too long since I've looked at this patch so I've lost
>> context.  I only remember thinking that instead of passing a
>> memory_source, you could simply pass an info with a special read_memory
>> implementation that reads from whatever you want.  So I think
>> essentially what you did in that last example.  The only remember I
>> would mention this is that it seems better to me to not have two ways of
>> doing the same thing, as it's sometimes ambiguous what happens when they
>> are both used, how they interact together.
>
> I think what happened is when I started writing the above I was planning
> to argue that my "memory_source" approach was best, but as I sketched
> out the sub-classing DisassembleInfo approach I realised that your idea
> probably was better.
>
> So I updated my patch, and then failed to reword the above text.
>
> I think we're all in alignment now, the way to intercept memory writes
> is to sub-class DisassembleInfo (like the second example above), so I
> think this issue is resolved.
>
>>
>>>> Is there a use case for other error kinds?  For example, if the
>>>> disassembler finds that the machine code does not encode a valid
>>>> instruction, what should it do?
>>>>
>>>> Maybe this is more a question for the gdb.disassembler.Disassembler
>>>> implementation side of the API.
>>> 
>>> No!  Like the comment says, everything should disassemble to something,
>>> even if it's just ".byte xxxx".  The libopcodes disassembler only
>>> supports reporting one type of error, that's memory error.  It's just
>>> unfortunate libopcodes also uses the return value to indicate that an
>>> error occurred, and in some cases disassemblers return -1 (to indicate
>>> error) without setting a memory error.  In these cases the disassembler
>>> has probably written to the output stream what went wrong, but this
>>> really is not how libopcodes is supposed to work.
>>> 
>>> So, no.  We either have a memory error, or an unknown error.  Ideally,
>>> given enough time, libopcodes will be fixed so that we _only_ ever emit
>>> memory errors.
>>
>> Ok.  Well, for an API that allows users to arbitrarily extend
>> functionality, it makes sense to always allow returning a generic error
>> of some kind.  Perhaps libopcodes can't fail, but perhaps (a bit
>> stretched example to illustrate) my disassembler makes a GET request to
>> an HTTP server to do the actual disassembling.  So it seems good to me
>> to always have a "something went wrong" escape hatch.
>
> I'll take another pass as this aspect and check how throwing arbitrary
> errors from Python code is presented to the user.

I've reworked how exceptions are handled through the whole Python
disassembler process now.  If we consider a disassembler like this:

  class ExampleDisassembler(gdb.disassembler.Disassembler):
  
      class InfoWrapper(gdb.disassembler.DisassembleInfo):
          def __init__(self, info):
              super().__init__(info)
  
          def read_memory(self, length, offset):
              buffer = super().read_memory(length, offset)
              return buffer
  
      def __init__(self):
          super().__init__("ExampleDisassembler")
  
      def __call__(self, info):
          info = self.InfoWrapper(info)
          result = gdb.disassembler.builtin_disassemble(info)
          return result

Then this will result in a call stack like this:

  gdbpy_print_insn (py-disasm.c)
    ExampleDisassembler.__call__ (user's Python code)
      disasmpy_builtin_disassemble (py-disasm.c)
        InfoWrapper.read_memory (user's Python code)

We can split the exceptions into 3 types:

  1. gdb.MemoryError, if this is raised in `read_memory` then the bultin
  disassembler might mask this exception, or it might choose to re-raise
  a gdb.MemoryError of its own.  If the MemoryError is raised then this
  will be seen (and can be caught) in the ExampleDisassembler.__call__.
  If the MemoryError is not caught then gdbpy_print_insn will handle it,
  the output looks like:

  (gdb) disassemble 0x0000000000401194,0x0000000000401198
  Dump of assembler code from 0x401194 to 0x401198:
     0x0000000000401194 <main+0>:
  Cannot access memory at address 0x401194

  2. gdb.GdbError, if an instance of this exception is raised in
  `read_memory` then it will propagate back and be catchable in
  ExampleDisassembler.__call__.  If the exception is not handled there
  then it will be caught and handled by gdbpy_print_insn.

  In keeping with how these exceptions are described in the `Exception
  Handling` section of the manual, these exceptions are not treated as
  errors in the Python code, but a mechanism for the user to report
  errors to the user.  If one of these makes it to gdbpy_print_insn,
  this is what the output looks like:

  (gdb) disassemble 0x0000000000401194,0x0000000000401198
  Dump of assembler code from 0x401194 to 0x401198:
     0x0000000000401194 <main+0>: this is a GdbError
  unknown disassembler error (error = -1)

  The whole "unknown disassembler error" is a consequence of the core
  GDB disassembler only having a mechanism to handle memory errors.
  Fixing this completely is not trivial though as I think a perfect
  solution would require updates to libopcodes.  Still, we could
  potentially present this error to the user in a slightly better way,
  but I'd like to defer changes in that area until later - it doesn't
  impact the Python API at all, so we can polish that part later with no
  backwards compatibility worries I think.

  3. Any other exception type.  If any other exception type is raised
  from `read_memory` then, as with GdbError, the exception is propagated
  back, and can be caught in ExampleDisassembler.__call__.  If the
  exception is not caught there then it is handled in gdbpy_print_insn,
  the output looks like this:

  (gdb) disassemble 0x0000000000401194,0x0000000000401198
  Dump of assembler code from 0x401194 to 0x401198:
     0x0000000000401194 <main+0>: Python Exception <class 'RuntimeError'>: this is a RuntimeError
  
  unknown disassembler error (error = -1)

  Other exception types are handled as errors in the user code, and so
  its possible to get a stack-trace using 'set python print-stack full'.

Other significant changes I've made:

  * After gdbpy_print_insn has called ExampleDisassembler.__call__, any
    errors found after this point will be reported as errors and stop
    the disassembler.  Previously, the error would be reported, but the
    default disassembler would then be used.  While working on the
    latest changes I decided that behaviour was not helpful, so removed
    it.

  * There are a few places where the libopcodes disassembler will return
    -1 (indicating an error), without first calling the memory_error
    callback.  As far as I'm concerned, this is a bug in libopcodes.  An
    example of this can be seen by looking for the string '64-bit address
    is disabled' in i386-dis.c.

    Previously, these cases were converted to gdb.MemoryError
    exceptions, however, this meant that there was a difference in
    behaviour between having no Python disassemblers, and a "default",
    pass-through disassembler written in Python.

    I now convert these cases into gdb.GdbError exceptions, with the
    payload of the exception being any text the disassembler has emitted
    so far.

    Combined with the updated handling of GdbError (described above),
    this means that a "default", pass-through disassembler, written in
    Python, gives the exact same output as GDB when no Python
    disassemblers are being used.

I'll follow up shortly with an updated version of this series that
includes all the fixes I describe above.  I know you said you're likely
too busy to review any further versions of this series, which is fine,
I'll give the new version a little time for others to look at, and then
I'll go ahead and merge it as I think we probably align on most of this
stuff now.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv6 0/6] Add Python API for the disassembler
  2022-05-06 17:17       ` [PATCHv5 " Andrew Burgess
                           ` (4 preceding siblings ...)
  2022-05-06 17:17         ` [PATCHv5 5/5] gdb: unify two dis_asm_read_memory functions in disasm.c Andrew Burgess
@ 2022-05-25 10:49         ` Andrew Burgess
  2022-05-25 10:49           ` [PATCHv6 1/6] gdb/python: convert gdbpy_err_fetch to use gdbpy_ref Andrew Burgess
                             ` (6 more replies)
  5 siblings, 7 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-05-25 10:49 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

Changes in v6:

  See: https://sourceware.org/pipermail/gdb-patches/2022-May/189391.html

  - Patch #1 is new, this is required to support changes I made in
    patch #4.

  - Patches #2, #3, #5, and #6 are unchanged since v5.

  - Patch #4 has changed:

    + In gdbpy_print_insn, we now report an error (-1) back to core
      GDB in more cases,

    + In disasmpy_builtin_disassemble we now catch gdbpy_err_fetch
      objects thrown as exceptions and restore them, this allows
      Python exceptions to propagate from
      gdbpy_disassembler::read_memory_func back to the users Python
      code.  Additionally, we now raise a gdb.GdbError in this
      function if the builtin disassembler has not registered a memory
      error.

    + In gdbpy_disassembler::read_memory_func, we capture any
      exception that is not a gdb.MemoryError and throw it using a
      gdbpy_err_fetch object, this will be caught in
      disasmpy_builtin_disassemble and restored.

    + Tests have been updated and expanded to take account of the new
      exception handling behaviour.  Only tests that exercised the
      exception handling code needed to change, which I was pleased
      about.

    + DOCS - There have been changes to the docs:

      * The DisassembleInfo description has been reordered, and the
        description of DisassembleInfo.read_memory has been mostly
        rewritten.

      * The description of Disassembler.__call__ has been mostly
        rewritten.

      * The description of builtin_disassemble has been mostly
        rewritten.

Changes in v5:

  - Patch #1, minor typo fixes, and reword some comments in line with
    Simon's feedback.  Have not restructured the class hierarchy, this
    was mentioned in Simon's feedback, but he also said he'd accept
    what I have right now.  I think what I have right now does have
    some benefits, so I've stuck with that for now.

  - Patch #2, minor typo fixes based on Simon's feedback.

  - Patch #3, lots of significant changes.

    + Documentation has been updated and expanded significantly,

    + Added a new 'maint info python-disassemblers' command,

    + Removed the memory_source argument to the builtin_disassembler
      function, DisassembleInfo objects can now be sub-classed to
      achieve the same result,

    + Added additional test to catch more of the error cases, and
      updated the tests that related to the memory_source usage that
      has now been removed.

    + Plus all the minor style issues and typos that Simon pointed
      out.

Changes in v4:

  - Patch #1 from v3 series has been merged,

  - Addressed Eli's feedback on previous series,

  - Rebased onto current upstream/master.

Changes in v3:

  - Rebased to current master, and retested,

  - Patch #1 is new in this series,

  - Patch #2 is changed slightly from v2, I've reworked the
    disassembler classes in a slightly different way now, in order to
    prepare for patches #5 and #6.

  - Patch #3 is unchanged from v2,

  - Patch #4 is unchanged from v2,

  - Patch #5 is new in v3.  I've included it here as the changes in #2
    only make sense knowing that patch #5 is coming,

  - Patch #6 is a small cleanup only possible after #2 and #5 have landed.

Changes in v2:

  - The first 3 patches from the v1 series were merged a while back,
    these were all refactoring, or auxiliary features,

  - There's a new #1 patch in the v2 series that does some new
    refactoring of GDB's disassembler classes, this was required in
    order to simplify the #3 patch,

  - Patch #2 in the v2 series is largely unchanged from patch #4 in
    the v1 series,

  - The syntax highlighting work that was in the v1 series was spun
    out into its own patch, and has been merged separately,

  - The format_address helper function that appeared in the v1 series,
    and that Simon suggested I make more general, was spun out into
    its own patch, and merged separately,

  - Finally, patch #3 in the v2 series is pretty much a complete
    rewrite from the v1 series in order to follow the approach
    suggested by Simon.  Results are now returned directly, either via
    'return' or by raising an exception, in contrast to the original
    approach which involved "setting" the result into an existing
    state object.

---

Andrew Burgess (6):
  gdb/python: convert gdbpy_err_fetch to use gdbpy_ref
  gdb: add new base class to gdb_disassembler
  gdb: add extension language print_insn hook
  gdb/python: implement the print_insn extension language hook
  gdb: refactor the non-printing disassemblers
  gdb: unify two dis_asm_read_memory functions in disasm.c

 gdb/Makefile.in                        |    1 +
 gdb/NEWS                               |   34 +
 gdb/arc-linux-tdep.c                   |   15 +-
 gdb/arc-tdep.c                         |   29 +-
 gdb/arc-tdep.h                         |    5 -
 gdb/arm-tdep.c                         |    4 +-
 gdb/data-directory/Makefile.in         |    1 +
 gdb/disasm-selftests.c                 |   70 +-
 gdb/disasm.c                           |  179 ++--
 gdb/disasm.h                           |  207 ++++-
 gdb/doc/gdb.texinfo                    |   45 +
 gdb/doc/python.texi                    |  328 +++++++
 gdb/extension-priv.h                   |   15 +
 gdb/extension.c                        |   20 +
 gdb/extension.h                        |   10 +
 gdb/guile/guile.c                      |    6 +-
 gdb/mips-tdep.c                        |    4 +-
 gdb/python/lib/gdb/disassembler.py     |  178 ++++
 gdb/python/py-disasm.c                 | 1090 ++++++++++++++++++++++++
 gdb/python/py-utils.c                  |    8 +-
 gdb/python/python-internal.h           |   46 +-
 gdb/python/python.c                    |    3 +
 gdb/s12z-tdep.c                        |   26 +-
 gdb/testsuite/gdb.python/py-disasm.c   |   25 +
 gdb/testsuite/gdb.python/py-disasm.exp |  209 +++++
 gdb/testsuite/gdb.python/py-disasm.py  |  712 ++++++++++++++++
 26 files changed, 3056 insertions(+), 214 deletions(-)
 create mode 100644 gdb/python/lib/gdb/disassembler.py
 create mode 100644 gdb/python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv6 1/6] gdb/python: convert gdbpy_err_fetch to use gdbpy_ref
  2022-05-25 10:49         ` [PATCHv6 0/6] Add Python API for the disassembler Andrew Burgess
@ 2022-05-25 10:49           ` Andrew Burgess
  2022-05-25 10:49           ` [PATCHv6 2/6] gdb: add new base class to gdb_disassembler Andrew Burgess
                             ` (5 subsequent siblings)
  6 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-05-25 10:49 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

Convert the gdbpy_err_fetch class to make use of gdbpy_ref, this
removes the need for manual reference count management, and allows the
destructor to be removed.

There should be no functional change after this commit.

I think this cleanup is worth doing on its own, however, in a later
commit I will want to copy instances of gdbpy_err_fetch, and switching
to using gdbpy_ref means that I can rely on the default copy
constructor, without having to add one that handles the reference
counts, so this is good preparation for that upcoming change.
---
 gdb/python/py-utils.c        |  8 ++++----
 gdb/python/python-internal.h | 23 ++++++++++-------------
 2 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/gdb/python/py-utils.c b/gdb/python/py-utils.c
index 63eb4e8078c..57710dd56c8 100644
--- a/gdb/python/py-utils.c
+++ b/gdb/python/py-utils.c
@@ -194,10 +194,10 @@ gdbpy_err_fetch::to_string () const
      Using str (aka PyObject_Str) will fetch the error message from
      gdb.GdbError ("message").  */
 
-  if (m_error_value && m_error_value != Py_None)
-    return gdbpy_obj_to_string (m_error_value);
+  if (m_error_value.get () != nullptr && m_error_value.get () != Py_None)
+    return gdbpy_obj_to_string (m_error_value.get ());
   else
-    return gdbpy_obj_to_string (m_error_type);
+    return gdbpy_obj_to_string (m_error_type.get ());
 }
 
 /* See python-internal.h.  */
@@ -205,7 +205,7 @@ gdbpy_err_fetch::to_string () const
 gdb::unique_xmalloc_ptr<char>
 gdbpy_err_fetch::type_to_string () const
 {
-  return gdbpy_obj_to_string (m_error_type);
+  return gdbpy_obj_to_string (m_error_type.get ());
 }
 
 /* Convert a GDB exception to the appropriate Python exception.
diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
index d947b96033b..79219c6bb86 100644
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@@ -549,14 +549,12 @@ class gdbpy_err_fetch
 
   gdbpy_err_fetch ()
   {
-    PyErr_Fetch (&m_error_type, &m_error_value, &m_error_traceback);
-  }
+    PyObject *error_type, *error_value, *error_traceback;
 
-  ~gdbpy_err_fetch ()
-  {
-    Py_XDECREF (m_error_type);
-    Py_XDECREF (m_error_value);
-    Py_XDECREF (m_error_traceback);
+    PyErr_Fetch (&error_type, &error_value, &error_traceback);
+    m_error_type.reset (error_type);
+    m_error_value.reset (error_value);
+    m_error_traceback.reset (error_traceback);
   }
 
   /* Call PyErr_Restore using the values stashed in this object.
@@ -565,10 +563,9 @@ class gdbpy_err_fetch
 
   void restore ()
   {
-    PyErr_Restore (m_error_type, m_error_value, m_error_traceback);
-    m_error_type = nullptr;
-    m_error_value = nullptr;
-    m_error_traceback = nullptr;
+    PyErr_Restore (m_error_type.release (),
+		   m_error_value.release (),
+		   m_error_traceback.release ());
   }
 
   /* Return the string representation of the exception represented by
@@ -587,12 +584,12 @@ class gdbpy_err_fetch
 
   bool type_matches (PyObject *type) const
   {
-    return PyErr_GivenExceptionMatches (m_error_type, type);
+    return PyErr_GivenExceptionMatches (m_error_type.get (), type);
   }
 
 private:
 
-  PyObject *m_error_type, *m_error_value, *m_error_traceback;
+  gdbpy_ref<> m_error_type, m_error_value, m_error_traceback;
 };
 
 /* Called before entering the Python interpreter to install the
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv6 2/6] gdb: add new base class to gdb_disassembler
  2022-05-25 10:49         ` [PATCHv6 0/6] Add Python API for the disassembler Andrew Burgess
  2022-05-25 10:49           ` [PATCHv6 1/6] gdb/python: convert gdbpy_err_fetch to use gdbpy_ref Andrew Burgess
@ 2022-05-25 10:49           ` Andrew Burgess
  2022-05-25 10:49           ` [PATCHv6 3/6] gdb: add extension language print_insn hook Andrew Burgess
                             ` (4 subsequent siblings)
  6 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-05-25 10:49 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

The motivation for this change is an upcoming Python disassembler API
that I would like to add.  As part of that change I need to create a
new disassembler like class that contains a disassemble_info and a
gdbarch.  The management of these two objects is identical to how we
manage these objects within gdb_disassembler, so it might be tempting
for my new class to inherit from gdb_disassembler.

The problem however, is that gdb_disassembler has a tight connection
between its constructor, and its print_insn method.  In the
constructor the ui_file* that is passed in is replaced with a member
variable string_file*, and then in print_insn, the contents of the
member variable string_file are printed to the original ui_file*.

What this means is that the gdb_disassembler class has a tight
coupling between its constructor and print_insn; the class just isn't
intended to be used in a situation where print_insn is not going to be
called, which is how my (upcoming) sub-class would need to operate.

My solution then, is to separate out the management of the
disassemble_info and gdbarch into a new gdb_disassemble_info class,
and make this class a parent of gdb_disassembler.

In arm-tdep.c and mips-tdep.c, where we used to cast the
disassemble_info->application_data to a gdb_disassembler, we can now
cast to a gdb_disassemble_info as we only need to access the gdbarch
information.

Now, my new Python disassembler sub-class will still want to print
things to an output stream, and so we will want access to the
dis_asm_fprintf functionality for printing.

However, rather than move this printing code into the
gdb_disassemble_info base class, I have added yet another level of
hierarchy, a gdb_printing_disassembler, thus the class structure is
now:

  struct gdb_disassemble_info {};
  struct gdb_printing_disassembler : public gdb_disassemble_info {};
  struct gdb_disassembler : public gdb_printing_disassembler {};

In a later commit my new Python disassembler will inherit from
gdb_printing_disassembler.

The reason for adding the additional layer to the class hierarchy is
that in yet another commit I intend to rewrite the function
gdb_buffered_insn_length, and to do this I will be creating yet more
disassembler like classes, however, these will not print anything,
thus I will add a gdb_non_printing_disassembler class that also
inherits from gdb_disassemble_info.  Knowing that that change is
coming, I've gone with the above class hierarchy now.

There should be no user visible changes after this commit.
---
 gdb/arm-tdep.c  |   4 +-
 gdb/disasm.c    |  58 +++++++++++++-------
 gdb/disasm.h    | 140 ++++++++++++++++++++++++++++++++++++++----------
 gdb/mips-tdep.c |   4 +-
 4 files changed, 154 insertions(+), 52 deletions(-)

diff --git a/gdb/arm-tdep.c b/gdb/arm-tdep.c
index 49664093f00..d09e570a619 100644
--- a/gdb/arm-tdep.c
+++ b/gdb/arm-tdep.c
@@ -8228,8 +8228,8 @@ arm_displaced_step_fixup (struct gdbarch *gdbarch,
 static int
 gdb_print_insn_arm (bfd_vma memaddr, disassemble_info *info)
 {
-  gdb_disassembler *di
-    = static_cast<gdb_disassembler *>(info->application_data);
+  gdb_disassemble_info *di
+    = static_cast<gdb_disassemble_info *> (info->application_data);
   struct gdbarch *gdbarch = di->arch ();
 
   if (arm_pc_is_thumb (gdbarch, memaddr))
diff --git a/gdb/disasm.c b/gdb/disasm.c
index f2df5ef7bc5..6ac84388cc3 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -166,7 +166,8 @@ gdb_disassembler::dis_asm_print_address (bfd_vma addr,
 /* Format disassembler output to STREAM.  */
 
 int
-gdb_disassembler::dis_asm_fprintf (void *stream, const char *format, ...)
+gdb_printing_disassembler::fprintf_func (void *stream,
+					 const char *format, ...)
 {
   va_list args;
 
@@ -180,9 +181,9 @@ gdb_disassembler::dis_asm_fprintf (void *stream, const char *format, ...)
 /* See disasm.h.  */
 
 int
-gdb_disassembler::dis_asm_styled_fprintf (void *stream,
-					  enum disassembler_style style,
-					  const char *format, ...)
+gdb_printing_disassembler::fprintf_styled_func (void *stream,
+						enum disassembler_style style,
+						const char *format, ...)
 {
   va_list args;
 
@@ -797,26 +798,41 @@ get_all_disassembler_options (struct gdbarch *gdbarch)
 
 gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
 				    struct ui_file *file,
-				    di_read_memory_ftype read_memory_func)
-  : m_gdbarch (gdbarch),
+				    read_memory_ftype func)
+  : gdb_printing_disassembler (gdbarch, &m_buffer, func,
+			       dis_asm_memory_error, dis_asm_print_address),
     m_buffer (!use_ext_lang_colorization_p && disassembler_styling
 	      && file->can_emit_style_escape ()),
     m_dest (file)
+{ /* Nothing.  */ }
+
+/* See disasm.h.  */
+
+gdb_disassemble_info::gdb_disassemble_info
+  (struct gdbarch *gdbarch, struct ui_file *stream,
+   read_memory_ftype read_memory_func, memory_error_ftype memory_error_func,
+   print_address_ftype print_address_func, fprintf_ftype fprintf_func,
+   fprintf_styled_ftype fprintf_styled_func)
+    : m_gdbarch (gdbarch)
 {
-  init_disassemble_info (&m_di, &m_buffer, dis_asm_fprintf,
-			 dis_asm_styled_fprintf);
+  gdb_assert (fprintf_func != nullptr);
+  gdb_assert (fprintf_styled_func != nullptr);
+  init_disassemble_info (&m_di, stream, fprintf_func,
+			 fprintf_styled_func);
   m_di.flavour = bfd_target_unknown_flavour;
-  m_di.memory_error_func = dis_asm_memory_error;
-  m_di.print_address_func = dis_asm_print_address;
-  /* NOTE: cagney/2003-04-28: The original code, from the old Insight
-     disassembler had a local optimization here.  By default it would
-     access the executable file, instead of the target memory (there
-     was a growing list of exceptions though).  Unfortunately, the
-     heuristic was flawed.  Commands like "disassemble &variable"
-     didn't work as they relied on the access going to the target.
-     Further, it has been superseeded by trust-read-only-sections
-     (although that should be superseeded by target_trust..._p()).  */
-  m_di.read_memory_func = read_memory_func;
+
+  /* The memory_error_func, print_address_func, and read_memory_func are
+     all initialized to a default (non-nullptr) value by the call to
+     init_disassemble_info above.  If the user is overriding these fields
+     (by passing non-nullptr values) then do that now, otherwise, leave
+     these fields as the defaults.  */
+  if (memory_error_func != nullptr)
+    m_di.memory_error_func = memory_error_func;
+  if (print_address_func != nullptr)
+    m_di.print_address_func = print_address_func;
+  if (read_memory_func != nullptr)
+    m_di.read_memory_func = read_memory_func;
+
   m_di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
   m_di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
   m_di.endian = gdbarch_byte_order (gdbarch);
@@ -828,7 +844,9 @@ gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
   disassemble_init_for_target (&m_di);
 }
 
-gdb_disassembler::~gdb_disassembler ()
+/* See disasm.h.  */
+
+gdb_disassemble_info::~gdb_disassemble_info ()
 {
   disassemble_free_target (&m_di);
 }
diff --git a/gdb/disasm.h b/gdb/disasm.h
index 7efab7db46c..f31ca92b038 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -26,43 +26,137 @@ struct gdbarch;
 struct ui_out;
 struct ui_file;
 
-class gdb_disassembler
-{
-  using di_read_memory_ftype = decltype (disassemble_info::read_memory_func);
-
-public:
-  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
-    : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
-  {}
+/* A wrapper around a disassemble_info and a gdbarch.  This is the core
+   set of data that all disassembler sub-classes will need.  This class
+   doesn't actually implement the disassembling process, that is something
+   that sub-classes will do, with each sub-class doing things slightly
+   differently.
 
-  ~gdb_disassembler ();
+   The constructor of this class is protected, you should not create
+   instances of this class directly, instead create an instance of an
+   appropriate sub-class.  */
 
-  DISABLE_COPY_AND_ASSIGN (gdb_disassembler);
-
-  int print_insn (CORE_ADDR memaddr, int *branch_delay_insns = NULL);
+struct gdb_disassemble_info
+{
+  DISABLE_COPY_AND_ASSIGN (gdb_disassemble_info);
 
-  /* Return the gdbarch of gdb_disassembler.  */
+  /* Return the gdbarch we are disassembling for.  */
   struct gdbarch *arch ()
   { return m_gdbarch; }
 
+  /* Return a pointer to the disassemble_info, this will be needed for
+     passing into the libopcodes disassembler.  */
+  struct disassemble_info *disasm_info ()
+  { return &m_di; }
+
 protected:
-  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file,
-		    di_read_memory_ftype func);
 
+  /* Types for the function callbacks within m_di.  */
+  using read_memory_ftype = decltype (disassemble_info::read_memory_func);
+  using memory_error_ftype = decltype (disassemble_info::memory_error_func);
+  using print_address_ftype = decltype (disassemble_info::print_address_func);
+  using fprintf_ftype = decltype (disassemble_info::fprintf_func);
+  using fprintf_styled_ftype = decltype (disassemble_info::fprintf_styled_func);
+
+  /* Constructor, many fields in m_di are initialized from GDBARCH.  STREAM
+     is where the output of the disassembler will be written too, the
+     remaining arguments are function callbacks that are written into
+     m_di.  Of these function callbacks FPRINTF_FUNC and
+     FPRINTF_STYLED_FUNC must not be nullptr.  If READ_MEMORY_FUNC,
+     MEMORY_ERROR_FUNC, or PRINT_ADDRESS_FUNC are nullptr, then that field
+     within m_di is left with its default value (see the libopcodes
+     function init_disassemble_info for the defaults).  */
+  gdb_disassemble_info (struct gdbarch *gdbarch,
+			struct ui_file *stream,
+			read_memory_ftype read_memory_func,
+			memory_error_ftype memory_error_func,
+			print_address_ftype print_address_func,
+			fprintf_ftype fprintf_func,
+			fprintf_styled_ftype fprintf_styled_func);
+
+  /* Destructor.  */
+  virtual ~gdb_disassemble_info ();
+
+  /* The stream that disassembler output is being written too.  */
   struct ui_file *stream ()
   { return (struct ui_file *) m_di.stream; }
 
-private:
-  struct gdbarch *m_gdbarch;
-
   /* Stores data required for disassembling instructions in
      opcodes.  */
   struct disassemble_info m_di;
 
+private:
+  /* The architecture we are disassembling for.  */
+  struct gdbarch *m_gdbarch;
+
   /* If we own the string in `m_di.disassembler_options', we do so
      using this field.  */
   std::string m_disassembler_options_holder;
+};
+
+/* A wrapper around gdb_disassemble_info.  This class adds default
+   print functions that are supplied to the disassemble_info within the
+   parent class.  These default print functions write to the stream, which
+   is also contained in the parent class.
+
+   As with the parent class, the constructor for this class is protected,
+   you should not create instances of this class, but create an
+   appropriate sub-class instead.  */
 
+struct gdb_printing_disassembler : public gdb_disassemble_info
+{
+  DISABLE_COPY_AND_ASSIGN (gdb_printing_disassembler);
+
+protected:
+
+  /* Constructor.  All the arguments are just passed to the parent class.
+     We also add the two print functions to the arguments passed to the
+     parent.  See gdb_disassemble_info for a description of how the
+     arguments are handled.  */
+  gdb_printing_disassembler (struct gdbarch *gdbarch,
+			     struct ui_file *stream,
+			     read_memory_ftype read_memory_func,
+			     memory_error_ftype memory_error_func,
+			     print_address_ftype print_address_func)
+    : gdb_disassemble_info (gdbarch, stream, read_memory_func,
+			    memory_error_func, print_address_func,
+			    fprintf_func, fprintf_styled_func)
+  { /* Nothing.  */ }
+
+  /* Callback used as the disassemble_info's fprintf_func callback, this
+     writes to STREAM, which will be m_di.stream.  */
+  static int fprintf_func (void *stream, const char *format, ...)
+    ATTRIBUTE_PRINTF(2,3);
+
+  /* Callback used as the disassemble_info's fprintf_styled_func callback,
+     this writes to STREAM, which will be m_di.stream.  */
+  static int fprintf_styled_func (void *stream,
+				  enum disassembler_style style,
+				  const char *format, ...)
+    ATTRIBUTE_PRINTF(3,4);
+};
+
+/* A dissassembler class that provides 'print_insn', a method for
+   disassembling a single instruction to the output stream.  */
+
+struct gdb_disassembler : public gdb_printing_disassembler
+{
+  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
+    : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
+  { /* Nothing.  */ }
+
+  DISABLE_COPY_AND_ASSIGN (gdb_disassembler);
+
+  /* Disassemble a single instruction at MEMADDR to the ui_file* that was
+     passed to the constructor.  If a memory error occurs while
+     disassembling this instruction then an error will be thrown.  */
+  int print_insn (CORE_ADDR memaddr, int *branch_delay_insns = NULL);
+
+protected:
+  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file,
+		    read_memory_ftype func);
+
+private:
   /* This member variable is given a value by calling dis_asm_memory_error.
      If after calling into the libopcodes disassembler we get back a
      negative value (which indicates an error), then, if this variable has
@@ -95,16 +189,6 @@ class gdb_disassembler
      (currently just to addresses and symbols) as it goes.  */
   static bool use_ext_lang_colorization_p;
 
-  static int dis_asm_fprintf (void *stream, const char *format, ...)
-    ATTRIBUTE_PRINTF(2,3);
-
-  /* Print formatted message to STREAM, the content can be styled based on
-     STYLE if desired.  */
-  static int dis_asm_styled_fprintf (void *stream,
-				     enum disassembler_style style,
-				     const char *format, ...)
-    ATTRIBUTE_PRINTF(3,4);
-
   static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
 				  unsigned int len,
 				  struct disassemble_info *info);
diff --git a/gdb/mips-tdep.c b/gdb/mips-tdep.c
index 805c5beba59..65aa86dd98d 100644
--- a/gdb/mips-tdep.c
+++ b/gdb/mips-tdep.c
@@ -7021,8 +7021,8 @@ reinit_frame_cache_sfunc (const char *args, int from_tty,
 static int
 gdb_print_insn_mips (bfd_vma memaddr, struct disassemble_info *info)
 {
-  gdb_disassembler *di
-    = static_cast<gdb_disassembler *>(info->application_data);
+  gdb_disassemble_info *di
+    = static_cast<gdb_disassemble_info *> (info->application_data);
   struct gdbarch *gdbarch = di->arch ();
 
   /* FIXME: cagney/2003-06-26: Is this even necessary?  The
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv6 3/6] gdb: add extension language print_insn hook
  2022-05-25 10:49         ` [PATCHv6 0/6] Add Python API for the disassembler Andrew Burgess
  2022-05-25 10:49           ` [PATCHv6 1/6] gdb/python: convert gdbpy_err_fetch to use gdbpy_ref Andrew Burgess
  2022-05-25 10:49           ` [PATCHv6 2/6] gdb: add new base class to gdb_disassembler Andrew Burgess
@ 2022-05-25 10:49           ` Andrew Burgess
  2022-05-25 10:49           ` [PATCHv6 4/6] gdb/python: implement the print_insn extension language hook Andrew Burgess
                             ` (3 subsequent siblings)
  6 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-05-25 10:49 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

This commit is setup for the next commit.

In the next commit I will add a Python API to intercept the print_insn
calls within GDB, each print_insn call is responsible for
disassembling, and printing one instruction.  After the next commit it
will be possible for a user to write Python code that either wraps
around the existing disassembler, or even, in extreme situations,
entirely replaces the existing disassembler.

This commit does not add any new Python API.

What this commit does is put the extension language framework in place
for a print_insn hook.  There's a new callback added to 'struct
extension_language_ops', which is then filled in with nullptr for Python
and Guile.

Finally, in the disassembler, the code is restructured so that the new
extension language function ext_lang_print_insn is called before we
delegate to gdbarch_print_insn.

After this, the next commit can focus entirely on providing a Python
implementation of the new print_insn callback.

There should be no user visible change after this commit.
---
 gdb/disasm.c         | 29 ++++++++++++++++++++++++++---
 gdb/extension-priv.h | 15 +++++++++++++++
 gdb/extension.c      | 20 ++++++++++++++++++++
 gdb/extension.h      | 10 ++++++++++
 gdb/guile/guile.c    |  6 +++++-
 gdb/python/python.c  |  2 ++
 6 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/gdb/disasm.c b/gdb/disasm.c
index 6ac84388cc3..4af40c916b2 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -851,6 +851,29 @@ gdb_disassemble_info::~gdb_disassemble_info ()
   disassemble_free_target (&m_di);
 }
 
+/* Wrapper around calling gdbarch_print_insn.  This function takes care of
+   first calling the extension language hooks for print_insn, and, if none
+   of the extension languages can print this instruction, calls
+   gdbarch_print_insn to do the work.
+
+   GDBARCH is the architecture to disassemble in, VMA is the address of the
+   instruction being disassembled, and INFO is the libopcodes disassembler
+   related information.  */
+
+static int
+gdb_print_insn_1 (struct gdbarch *gdbarch, CORE_ADDR vma,
+		  struct disassemble_info *info)
+{
+  /* Call into the extension languages to do the disassembly.  */
+  gdb::optional<int> length = ext_lang_print_insn (gdbarch, vma, info);
+  if (length.has_value ())
+    return *length;
+
+  /* No extension language wanted to do the disassembly, so do it
+     manually.  */
+  return gdbarch_print_insn (gdbarch, vma, info);
+}
+
 /* See disasm.h.  */
 
 bool gdb_disassembler::use_ext_lang_colorization_p = true;
@@ -864,7 +887,7 @@ gdb_disassembler::print_insn (CORE_ADDR memaddr,
   m_err_memaddr.reset ();
   m_buffer.clear ();
 
-  int length = gdbarch_print_insn (arch (), memaddr, &m_di);
+  int length = gdb_print_insn_1 (arch (), memaddr, &m_di);
 
   /* If we have successfully disassembled an instruction, styling is on, we
      think that the extension language might be able to perform styling for
@@ -899,7 +922,7 @@ gdb_disassembler::print_insn (CORE_ADDR memaddr,
 	  gdb_assert (!m_buffer.term_out ());
 	  m_buffer.~string_file ();
 	  new (&m_buffer) string_file (true);
-	  length = gdbarch_print_insn (arch (), memaddr, &m_di);
+	  length = gdb_print_insn_1 (arch (), memaddr, &m_di);
 	  gdb_assert (length > 0);
 	}
     }
@@ -1054,7 +1077,7 @@ gdb_buffered_insn_length (struct gdbarch *gdbarch,
   gdb_buffered_insn_length_init_dis (gdbarch, &di, insn, max_len, addr,
 				     &disassembler_options_holder);
 
-  int result = gdbarch_print_insn (gdbarch, addr, &di);
+  int result = gdb_print_insn_1 (gdbarch, addr, &di);
   disassemble_free_target (&di);
   return result;
 }
diff --git a/gdb/extension-priv.h b/gdb/extension-priv.h
index d9450b51231..7c74e721c57 100644
--- a/gdb/extension-priv.h
+++ b/gdb/extension-priv.h
@@ -263,6 +263,21 @@ struct extension_language_ops
      contents, or an empty optional.  */
   gdb::optional<std::string> (*colorize_disasm) (const std::string &content,
 						 gdbarch *gdbarch);
+
+  /* Print a single instruction from ADDRESS in architecture GDBARCH.  INFO
+     is the standard libopcodes disassembler_info structure.  Bytes for the
+     instruction being printed should be read using INFO->read_memory_func
+     as the actual instruction bytes might be in a buffer.
+
+     Use INFO->fprintf_func to print the results of the disassembly, and
+     return the length of the instruction.
+
+     If no instruction can be disassembled then return an empty value and
+     other extension languages will get a chance to perform the
+     disassembly.  */
+  gdb::optional<int> (*print_insn) (struct gdbarch *gdbarch,
+				    CORE_ADDR address,
+				    struct disassemble_info *info);
 };
 
 /* State necessary to restore a signal handler to its previous value.  */
diff --git a/gdb/extension.c b/gdb/extension.c
index 8f39b86e952..5a805bea00e 100644
--- a/gdb/extension.c
+++ b/gdb/extension.c
@@ -924,6 +924,26 @@ ext_lang_colorize_disasm (const std::string &content, gdbarch *gdbarch)
   return result;
 }
 
+/* See extension.h.  */
+
+gdb::optional<int>
+ext_lang_print_insn (struct gdbarch *gdbarch, CORE_ADDR address,
+		     struct disassemble_info *info)
+{
+  for (const struct extension_language_defn *extlang : extension_languages)
+    {
+      if (extlang->ops == nullptr
+	  || extlang->ops->print_insn == nullptr)
+	continue;
+      gdb::optional<int> length
+	= extlang->ops->print_insn (gdbarch, address, info);
+      if (length.has_value ())
+	return length;
+    }
+
+  return {};
+}
+
 /* Called via an observer before gdb prints its prompt.
    Iterate over the extension languages giving them a chance to
    change the prompt.  The first one to change the prompt wins,
diff --git a/gdb/extension.h b/gdb/extension.h
index 7eb89530c44..47839ea50be 100644
--- a/gdb/extension.h
+++ b/gdb/extension.h
@@ -327,6 +327,16 @@ extern gdb::optional<std::string> ext_lang_colorize
 extern gdb::optional<std::string> ext_lang_colorize_disasm
   (const std::string &content, gdbarch *gdbarch);
 
+/* Calls extension_language_ops::print_insn for each extension language,
+   returning the result from the first extension language that returns a
+   non-empty result (any further extension languages are not then called).
+
+   All arguments are forwarded to extension_language_ops::print_insn, see
+   that function for a full description.  */
+
+extern gdb::optional<int> ext_lang_print_insn
+  (struct gdbarch *gdbarch, CORE_ADDR address, struct disassemble_info *info);
+
 #if GDB_SELF_TEST
 namespace selftests {
 extern void (*hook_set_active_ext_lang) ();
diff --git a/gdb/guile/guile.c b/gdb/guile/guile.c
index c7be48fb739..14b191ded62 100644
--- a/gdb/guile/guile.c
+++ b/gdb/guile/guile.c
@@ -130,8 +130,12 @@ static const struct extension_language_ops guile_extension_ops =
   gdbscm_breakpoint_has_cond,
   gdbscm_breakpoint_cond_says_stop,
 
-  NULL, /* gdbscm_check_quit_flag, */
   NULL, /* gdbscm_set_quit_flag, */
+  NULL, /* gdbscm_check_quit_flag, */
+  NULL, /* gdbscm_before_prompt, */
+  NULL, /* gdbscm_get_matching_xmethod_workers */
+  NULL, /* gdbscm_colorize */
+  NULL, /* gdbscm_print_insn */
 };
 #endif
 
diff --git a/gdb/python/python.c b/gdb/python/python.c
index 11aaa7ae778..b5b8379e23c 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -166,6 +166,8 @@ static const struct extension_language_ops python_extension_ops =
   gdbpy_colorize,
 
   gdbpy_colorize_disasm,
+
+  NULL, /* gdbpy_print_insn, */
 };
 
 #endif /* HAVE_PYTHON */
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv6 4/6] gdb/python: implement the print_insn extension language hook
  2022-05-25 10:49         ` [PATCHv6 0/6] Add Python API for the disassembler Andrew Burgess
                             ` (2 preceding siblings ...)
  2022-05-25 10:49           ` [PATCHv6 3/6] gdb: add extension language print_insn hook Andrew Burgess
@ 2022-05-25 10:49           ` Andrew Burgess
  2022-05-25 13:32             ` Eli Zaretskii
  2022-05-25 10:49           ` [PATCHv6 5/6] gdb: refactor the non-printing disassemblers Andrew Burgess
                             ` (2 subsequent siblings)
  6 siblings, 1 reply; 80+ messages in thread
From: Andrew Burgess @ 2022-05-25 10:49 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

This commit extends the Python API to include disassembler support.

The motivation for this commit was to provide an API by which the user
could write Python scripts that would augment the output of the
disassembler.

To achieve this I have followed the model of the existing libopcodes
disassembler, that is, instructions are disassembled one by one.  This
does restrict the type of things that it is possible to do from a
Python script, i.e. all additional output has to fit on a single line,
but this was all I needed, and creating something more complex would,
I think, require greater changes to how GDB's internal disassembler
operates.

The disassembler API is contained in the new gdb.disassembler module,
which defines the following classes:

  DisassembleInfo

      Similar to libopcodes disassemble_info structure, has read-only
  properties: address, architecture, and progspace.  And has methods:
  __init__, read_memory, and is_valid.

      Each time GDB wants an instruction disassembled, an instance of
  this class is passed to a user written disassembler function, by
  reading the properties, and calling the methods (and other support
  methods in the gdb.disassembler module) the user can perform and
  return the disassembly.

  Disassembler

      This is a base-class which user written disassemblers should
  inherit from.  This base class provides base implementations of
  __init__ and __call__ which the user written disassembler should
  override.

  DisassemblerResult

      This class can be used to hold the result of a call to the
  disassembler, it's really just a wrapper around a string (the text
  of the disassembled instruction) and a length (in bytes).  The user
  can return an instance of this class from Disassembler.__call__ to
  represent the newly disassembled instruction.

The gdb.disassembler module also provides the following functions:

  register_disassembler

      This function registers an instance of a Disassembler sub-class
  as a disassembler, either for one specific architecture, or, as a
  global disassembler for all architectures.

  builtin_disassemble

      This provides access to GDB's builtin disassembler.  A common
  use case that I see is augmenting the existing disassembler output.
  The user code can call this function to have GDB disassemble the
  instruction in the normal way.  The user gets back a
  DisassemblerResult object, which they can then read in order to
  augment the disassembler output in any way they wish.

      This function also provides a mechanism to intercept the
  disassemblers reads of memory, thus the user can adjust what GDB
  sees when it is disassembling.

The included documentation provides a more detailed description of the
API.

There is also a new CLI command added:

  maint info python-disassemblers

This command is defined in the Python gdb.disassemblers module, and
can be used to list the currently registered Python disassemblers.
---
 gdb/Makefile.in                        |    1 +
 gdb/NEWS                               |   34 +
 gdb/data-directory/Makefile.in         |    1 +
 gdb/doc/gdb.texinfo                    |   45 +
 gdb/doc/python.texi                    |  328 +++++++
 gdb/python/lib/gdb/disassembler.py     |  178 ++++
 gdb/python/py-disasm.c                 | 1090 ++++++++++++++++++++++++
 gdb/python/python-internal.h           |   23 +
 gdb/python/python.c                    |    3 +-
 gdb/testsuite/gdb.python/py-disasm.c   |   25 +
 gdb/testsuite/gdb.python/py-disasm.exp |  209 +++++
 gdb/testsuite/gdb.python/py-disasm.py  |  712 ++++++++++++++++
 12 files changed, 2648 insertions(+), 1 deletion(-)
 create mode 100644 gdb/python/lib/gdb/disassembler.py
 create mode 100644 gdb/python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

diff --git a/gdb/Makefile.in b/gdb/Makefile.in
index 418094775a5..42a0ebb371b 100644
--- a/gdb/Makefile.in
+++ b/gdb/Makefile.in
@@ -393,6 +393,7 @@ SUBDIR_PYTHON_SRCS = \
 	python/py-cmd.c \
 	python/py-connection.c \
 	python/py-continueevent.c \
+	python/py-disasm.c \
 	python/py-event.c \
 	python/py-evtregistry.c \
 	python/py-evts.c \
diff --git a/gdb/NEWS b/gdb/NEWS
index a72fee81550..f5ed294fe8f 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -41,6 +41,40 @@ maintenance info line-table
      This is the same format that GDB uses when printing address, symbol,
      and offset information from the disassembler.
 
+  ** New Python API for wrapping GDB's disassembler:
+
+     - gdb.disassembler.register_disassembler(DISASSEMBLER, ARCH).
+       DISASSEMBLER is a sub-class of gdb.disassembler.Disassembler.
+       ARCH is either None or a string containing a bfd architecture
+       name.  DISASSEMBLER is registered as a disassembler for
+       architecture ARCH, or for all architectures if ARCH is None.
+       The previous disassembler registered for ARCH is returned, this
+       can be None if no previous disassembler was registered.
+
+     - gdb.disassembler.Disassembler is the class from which all
+       disassemblers should inherit.  Its constructor takes a string,
+       a name for the disassembler, which is currently only used in
+       some debug output.  Sub-classes should override the __call__
+       method to perform disassembly, invoking __call__ on this base
+       class will raise an exception.
+
+     - gdb.disassembler.DisassembleInfo is the class used to describe
+       a single disassembly request from GDB.  An instance of this
+       class is passed to the __call__ method of
+       gdb.disassembler.Disassembler and has the following read-only
+       attributes: 'address', and 'architecture', as well as the
+       following method: 'read_memory'.
+
+     - gdb.disassembler.builtin_disassemble(INFO, MEMORY_SOURCE),
+       calls GDB's builtin disassembler on INFO, which is a
+       gdb.disassembler.DisassembleInfo object.  MEMORY_SOURCE is
+       optional, its default value is None.  If MEMORY_SOURCE is not
+       None then it must be an object that has a 'read_memory' method.
+
+     - gdb.disassembler.DisassemblerResult is a class that can be used
+       to wrap the result of a call to a Disassembler.  It has
+       read-only attributes 'length' and 'string'.
+
 *** Changes in GDB 12
 
 * DBX mode is deprecated, and will be removed in GDB 13
diff --git a/gdb/data-directory/Makefile.in b/gdb/data-directory/Makefile.in
index b606fc654b5..cf5226f3961 100644
--- a/gdb/data-directory/Makefile.in
+++ b/gdb/data-directory/Makefile.in
@@ -69,6 +69,7 @@ PYTHON_DIR = python
 PYTHON_INSTALL_DIR = $(DESTDIR)$(GDB_DATADIR)/$(PYTHON_DIR)
 PYTHON_FILE_LIST = \
 	gdb/__init__.py \
+	gdb/disassembler.py \
 	gdb/FrameDecorator.py \
 	gdb/FrameIterator.py \
 	gdb/frames.py \
diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
index e5c1ee33aac..7945f863612 100644
--- a/gdb/doc/gdb.texinfo
+++ b/gdb/doc/gdb.texinfo
@@ -39544,6 +39544,51 @@
 @item maint info jit
 Print information about JIT code objects loaded in the current inferior.
 
+@anchor{maint info python-disassemblers}
+@kindex maint info python-disassemblers
+@item maint info python-disassemblers
+This command is defined within the @code{gdb.disassembler} Python
+module (@pxref{Disassembly In Python}), and will only be present after
+that module has been imported.  To force the module to be imported do
+the following:
+
+@smallexample
+(@value{GDBP}) python import gdb.disassembler
+@end smallexample
+
+This command lists all the architectures for which a disassembler is
+currently registered, and the name of the disassembler.  If a
+disassembler is registered for all architectures, then this is listed
+last against the @samp{GLOBAL} architecture.
+
+If one of the disassemblers would be selected for the architecture of
+the current inferior, then this disassembler will be marked.
+
+The following example shows a situation in which two disassemblers are
+registered, initially the @samp{i386} disassembler matches the current
+architecture, then the architecture is changed, now the @samp{GLOBAL}
+disassembler matches.
+
+@smallexample
+@group
+(@value{GDBP}) show architecture
+The target architecture is set to "auto" (currently "i386").
+(@value{GDBP}) maint info python-disassemblers
+Architecture        Disassember Name
+i386                Disassembler_1	(Matches current architecture)
+GLOBAL              Disassembler_2
+@end group
+@group
+(@value{GDBP}) set architecture arm
+The target architecture is set to "arm".
+(@value{GDBP}) maint info python-disassemblers
+quit
+Architecture        Disassember Name
+i386                Disassembler_1
+GLOBAL              Disassembler_2	(Matches current architecture)
+@end group
+@end smallexample
+
 @kindex set displaced-stepping
 @kindex show displaced-stepping
 @cindex displaced stepping support
diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
index cb5283e03c0..cfe3e565bbc 100644
--- a/gdb/doc/python.texi
+++ b/gdb/doc/python.texi
@@ -222,6 +222,7 @@
 * Registers In Python::         Python representation of registers.
 * Connections In Python::	Python representation of connections.
 * TUI Windows In Python::       Implementing new TUI windows.
+* Disassembly In Python::       Instruction Disassembly In Python
 @end menu
 
 @node Basic Python
@@ -598,6 +599,7 @@
 related prompts are prohibited from being changed.
 @end defun
 
+@anchor{gdb_architecture_names}
 @defun gdb.architecture_names ()
 Return a list containing all of the architecture names that the
 current build of @value{GDBN} supports.  Each architecture name is a
@@ -3278,6 +3280,7 @@
 particular frame (@pxref{Frames In Python}).
 @end defun
 
+@anchor{gdbpy_inferior_read_memory}
 @findex Inferior.read_memory
 @defun Inferior.read_memory (address, length)
 Read @var{length} addressable memory units from the inferior, starting at
@@ -6562,6 +6565,331 @@
 values can be 1 (left), 2 (middle), or 3 (right).
 @end defun
 
+@node Disassembly In Python
+@subsubsection Instruction Disassembly In Python
+@cindex python instruction disassembly
+
+@value{GDBN}'s builtin disassembler can be extended, or even replaced,
+using the Python API.  The disassembler related features are contained
+within the @code{gdb.disassembler} module:
+
+@deftp {class} gdb.disassembler.DisassembleInfo
+Disassembly is driven by instances of this class.  Each time
+@value{GDBN} needs to disassemble an instruction, an instance of this
+class is created and passed to a registered disassembler.  The
+disassembler is then responsible for disassembling an instruction and
+returning a result.
+
+Instances of this type are usually created within @value{GDBN},
+however, it is possible to create a copy of an instance of this type,
+see the description of @code{__init__} for more details.
+
+This class has the following properties and methods:
+
+@defvar DisassembleInfo.address
+A read-only integer containing the address at which @value{GDBN}
+wishes to disassemble a single instruction.
+@end defvar
+
+@defvar DisassembleInfo.architecture
+The @code{gdb.Architecture} (@pxref{Architectures In Python}) for
+which @value{GDBN} is currently disassembling, this property is
+read-only.
+@end defvar
+
+@defvar DisassembleInfo.progspace
+The @code{gdb.Progspace} (@pxref{Progspaces In Python,,Program Spaces
+In Python}) for which @value{GDBN} is currently disassembling, this
+property is read-only.
+@end defvar
+
+@defun DisassembleInfo.is_valid ()
+Returns @code{True} if the @code{DisassembleInfo} object is valid,
+@code{False} if not.  A @code{DisassembleInfo} object will become
+invalid once the disassembly call for which the @code{DisassembleInfo}
+was created, has returned.  Calling other @code{DisassembleInfo}
+methods, or accessing @code{DisassembleInfo} properties, will raise a
+@code{RuntimeError} exception if it is invalid.
+@end defun
+
+@defun DisassembleInfo.__init__ (info)
+This can be used to create a new @code{DisassembleInfo} object that is
+a copy of @var{info}.  The copy will have the same @code{address},
+@code{architecture}, and @code{progspace} values as @var{info}, and
+will become invalid at the same time as @var{info}.
+
+This method exists so that sub-classes of @code{DisassembleInfo} can
+be created, these sub-classes must be initialized as copies of an
+existing @code{DisassembleInfo} object, but sub-classes might choose
+to override the @code{read_memory} method, and so control what
+@value{GDBN} sees when reading from memory
+(@pxref{builtin_disassemble}).
+@end defun
+
+@defun DisassembleInfo.read_memory (length, offset)
+This method allows the disassembler to read the bytes of the
+instruction to be disassembled.  The method reads @var{length} bytes,
+starting at @var{offset} from
+@code{DisassembleInfo.address}.
+
+It is important that the disassembler read the instruction bytes using
+this method, rather than reading inferior memory directly, as in some
+cases @value{GDBN} disassembles from an internal buffer rather than
+directly from inferior memory, calling this method handles this
+detail.
+
+Returns a buffer object, which behaves much like an array or a string,
+just as @code{Inferior.read_memory} does
+(@pxref{gdbpy_inferior_read_memory,,Inferior.read_memory}).  The
+length of the returned buffer will always be exactly @var{length}.
+
+If @value{GDBN} is unable to read the required memory then a
+@code{gdb.MemoryError} exception is raised (@pxref{Exception
+Handling}).
+
+This method can be overridden by a sub-class in order to control what
+@value{GDBN} sees when reading from memory
+(@pxref{builtin_disassemble}).  When overriding this method it is
+important to understand how @code{builtin_disassemble} makes use of
+this method.
+
+While disassembling a single instruction there could be multiple calls
+to this method, and the same bytes might be read multiple times.  Any
+single call might only read a subset of the total instruction bytes.
+
+If an implementation of @code{read_memory} is unable to read the
+requested memory contents, for example, if there's a request to read
+from an invalid memory address, then a @code{gdb.MemoryError} should
+be raised.
+
+Raising a @code{MemoryError} inside @code{read_memory} does not
+automatically mean a @code{MemoryError} will be raised by
+@code{builtin_disassemble}.  It is possible the @value{GDBN}'s builtin
+disassembler is probing to see how many bytes are available.  When
+@code{read_memory} raises the @code{MemoryError} the builtin
+disassembler might be able to perform a complete disassembly with the
+bytes it has available, in this case @code{builtin_disassemble} will
+not itself raise a @code{MemoryError}.
+
+Any other exception type raised in @code{read_memory} will propagate
+back and be available re-raised by @code{builtin_disassemble}.
+@end defun
+@end deftp
+
+@deftp {class} Disassembler
+This is a base class from which all user implemented disassemblers
+must inherit.
+
+@defun Disassembler.__init__ (name)
+The constructor takes @var{name}, a string, which should be a short
+name for this disassembler.
+@end defun
+
+@defun Disassembler.__call__ (info)
+The @code{__call__} method must be overridden by sub-classes to
+perform disassembly.  Calling @code{__call__} on this base class will
+raise a @code{NotImplementedError} exception.
+
+The @var{info} argument is an instance of @code{DisassembleInfo}, and
+describes the instruction that @value{GDBN} wants disassembling.
+
+If this function returns @code{None}, this indicates to @value{GDBN}
+that this sub-class doesn't wish to disassemble the requested
+instruction.  @value{GDBN} will then use its builtin disassembler to
+perform the disassembly.
+
+Alternatively, this function can return a @code{DisassemblerResult}
+that represents the disassembled instruction, this type is described
+in more detail below.
+
+The @code{__call__} method can raise a @code{gdb.MemoryError}
+exception (@pxref{Exception Handling}) to indicate to @value{GDBN}
+that there was a problem accessing the required memory, this will then
+be displayed by @value{GDBN} within the disassembler output.
+
+Ideally, the only three outcomes from invoking @code{__call__} would
+be a return of @code{None}, a successful disassembly returned in a
+@code{DisassemblerResult}, or a @code{MemoryError} indicating that
+there was a problem reading memory.
+
+However, as an implementation of @code{__call__} could fail due to
+other reasons, e.g.@: some external resource required to perform
+disassembly is temporarily unavailable, then, if @code{__call__}
+raises a @code{GdbError}, the exception will be converted to a string
+and printed at the end of the disassembly output, the disassembly
+request will then stop.
+
+Any other exception type raised by the @code{__call__} method is
+considered an error in the user code, the exception will be printed to
+the error stream according to the @kbd{set python print-stack} setting
+(@pxref{set_python_print_stack,,@kbd{set python print-stack}}).
+@end defun
+@end deftp
+
+@deftp {class} DisassemblerResult
+This class is used to hold the result of calling
+@w{@code{Disassembler.__call__}}, and represents a single disassembled
+instruction.  This class has the following properties and methods:
+
+@defun DisassemblerResult.__init__ (@var{length}, @var{string})
+Initialize an instance of this class, @var{length} is the length of
+the disassembled instruction in bytes, which must be greater than
+zero, and @var{string} is a non-empty string that represents the
+disassembled instruction.
+@end defun
+
+@defvar DisassemblerResult.length
+A read-only property containing the length of the disassembled
+instruction in bytes, this will always be greater than zero.
+@end defvar
+
+@defvar DisassemblerResult.string
+A read-only property containing a non-empty string representing the
+disassembled instruction.
+@end defvar
+@end deftp
+
+The following functions are also contained in the
+@code{gdb.disassembler} module:
+
+@defun register_disassembler (disassembler, architecture)
+The @var{disassembler} must be a sub-class of
+@code{gdb.disassembler.Disassembler} or @code{None}.
+
+The optional @var{architecture} is either a string, or the value
+@code{None}.  If it is a string, then it should be the name of an
+architecture known to @value{GDBN}, as returned either from
+@code{gdb.Architecture.name}
+(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from
+@code{gdb.architecture_names}
+(@pxref{gdb_architecture_names,,gdb.architecture_names}).
+
+The @var{disassembler} will be installed for the architecture named by
+@var{architecture}, or if @var{architecture} is @code{None}, then
+@var{disassembler} will be installed as a global disassembler for use
+by all architectures.
+
+@cindex disassembler in Python, global vs.@: specific
+@cindex search order for disassembler in Python
+@cindex look up of disassembler in Python
+@value{GDBN} only records a single disassembler for each architecture,
+and a single global disassembler.  Calling
+@code{register_disassembler} for an architecture, or for the global
+disassembler, will replace any existing disassembler registered for
+that @var{architecture} value.  The previous disassembler is returned.
+
+If @var{disassembler} is @code{None} then any disassembler currently
+registered for @var{architecture} is deregistered and returned.
+
+When @value{GDBN} is looking for a disassembler to use, @value{GDBN}
+first looks for an architecture specific disassembler.  If none has
+been registered then @value{GDBN} looks for a global disassembler (one
+registered with @var{architecture} set to @code{None}).  Only one
+disassembler is called to perform disassembly, so, if there is both an
+architecture specific disassembler, and a global disassembler
+registered, it is the architecture specific disassembler that will be
+used.
+
+@value{GDBN} tracks the architecture specific, and global
+disassemblers separately, so it doesn't matter in which order
+disassemblers are created or registered; an architecture specific
+disassembler, if present, will always be used in preference to a
+global disassembler.
+
+You can use the @kbd{maint info python-disassemblers} command
+(@pxref{maint info python-disassemblers}) to see which disassemblers
+have been registered.
+@end defun
+
+@anchor{builtin_disassemble}
+@defun builtin_disassemble (info)
+This function calls back into @value{GDBN}'s builtin disassembler to
+disassemble the instruction identified by @var{info}, an instance, or
+sub-class, of @code{DisassembleInfo}.
+
+When the builtin disassembler needs to read memory the
+@code{read_memory} method on @var{info} will be called.  By
+sub-classing @code{DisassembleInfo} and overriding the
+@code{read_memory} method, it is possible to intercept calls to
+@code{read_memory} from the builtin disassembler, and to modify the
+values returned.
+
+It is important to understand that, even when
+@code{DisassembleInfo.read_memory} raises a @code{gdb.MemoryError}, it
+is the internal disassembler itself that reports the memory error to
+@value{GDBN}.  The reason for this is that the disassembler might
+probe memory to see if a byte is readable or not; if the byte can't be
+read then the disassembler may choose not to report an error, but
+instead to disassemble the bytes that it does have available.
+
+If the builtin disassembler is successful then an instance of
+@code{DisassemblerResult} is returned from @code{builtin_disassemble},
+alternatively, if something goes wrong, an exception will be raised.
+
+A @code{MemoryError} will be raised if @code{builtin_disassemble} is
+unable to read some memory that is required in order to perform
+disassembly correctly.
+
+Any exception that is not a @code{MemoryError}, that is raised in a
+call to @code{read_memory}, will pass through
+@code{builtin_disassemble}, and be visible to the caller.
+
+Finally, there are a few cases where @value{GDBN}'s builtin
+disassembler can fail for reasons that are not covered by
+@code{MemoryError}.  In these cases, a @code{GdbError} will be raised.
+The contents of the exception will be a string describing the problem
+the disassembler encountered.
+@end defun
+
+Here is an example that registers a global disassembler.  The new
+disassembler invokes the builtin disassembler, and then adds a
+comment, @code{## Comment}, to each line of disassembly output:
+
+@smallexample
+class ExampleDisassembler(gdb.disassembler.Disassembler):
+    def __init__(self):
+        super().__init__("ExampleDisassembler")
+
+    def __call__(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        length = result.length
+        text = result.string + "\t## Comment"
+        return gdb.disassembler.DisassemblerResult(length, text)
+
+gdb.disassembler.register_disassembler(ExampleDisassembler())
+@end smallexample
+
+The following example creates a sub-class of @code{DisassembleInfo} in
+order to intercept the @code{read_memory} calls, within
+@code{read_memory} any bytes read from memory have the two 4-bit
+nibbles swapped around.  This isn't a very useful adjustment, but
+serves as an example.
+
+@smallexample
+class MyInfo(gdb.disassembler.DisassembleInfo):
+    def __init__(self, info):
+        super().__init__(info)
+
+    def read_memory(self, length, offset):
+        buffer = super().read_memory(length, offset)
+        result = bytearray()
+        for b in buffer:
+            v = int.from_bytes(b, 'little')
+            v = (v << 4) & 0xf0 | (v >> 4)
+            result.append(v)
+        return memoryview(result)
+
+class NibbleSwapDisassembler(gdb.disassembler.Disassembler):
+    def __init__(self):
+        super().__init__("NibbleSwapDisassembler")
+
+    def __call__(self, info):
+        info = MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+gdb.disassembler.register_disassembler(NibbleSwapDisassembler())
+@end smallexample
+
 @node Python Auto-loading
 @subsection Python Auto-loading
 @cindex Python auto-loading
diff --git a/gdb/python/lib/gdb/disassembler.py b/gdb/python/lib/gdb/disassembler.py
new file mode 100644
index 00000000000..5a2d94a5fac
--- /dev/null
+++ b/gdb/python/lib/gdb/disassembler.py
@@ -0,0 +1,178 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+"""Disassembler related module."""
+
+import gdb
+import _gdb.disassembler
+
+# Re-export everything from the _gdb.disassembler module, which is
+# defined within GDB's C++ code.
+from _gdb.disassembler import *
+
+# Module global dictionary of gdb.disassembler.Disassembler objects.
+# The keys of this dictionary are bfd architecture names, or the
+# special value None.
+#
+# When a request to disassemble comes in we first lookup the bfd
+# architecture name from the gdbarch, if that name exists in this
+# dictionary then we use that Disassembler object.
+#
+# If there's no architecture specific disassembler then we look for
+# the key None in this dictionary, and if that key exists, we use that
+# disassembler.
+#
+# If none of the above checks found a suitable disassembler, then no
+# disassembly is performed in Python.
+_disassemblers_dict = {}
+
+
+class Disassembler(object):
+    """A base class from which all user implemented disassemblers must
+    inherit."""
+
+    def __init__(self, name):
+        """Constructor.  Takes a name, which should be a string, which can be
+        used to identify this disassembler in diagnostic messages."""
+        self.name = name
+
+    def __call__(self, info):
+        """A default implementation of __call__.  All sub-classes must
+        override this method.  Calling this default implementation will throw
+        a NotImplementedError exception."""
+        raise NotImplementedError("Disassembler.__call__")
+
+
+def register_disassembler(disassembler, architecture=None):
+    """Register a disassembler.  DISASSEMBLER is a sub-class of
+    gdb.disassembler.Disassembler.  ARCHITECTURE is either None or a
+    string, the name of an architecture known to GDB.
+
+    DISASSEMBLER is registered as a disassembler for ARCHITECTURE, or
+    all architectures when ARCHITECTURE is None.
+
+    Returns the previous disassembler registered with this
+    ARCHITECTURE value.
+    """
+
+    if not isinstance(disassembler, Disassembler) and disassembler is not None:
+        raise TypeError("disassembler should sub-class gdb.disassembler.Disassembler")
+
+    old = None
+    if architecture in _disassemblers_dict:
+        old = _disassemblers_dict[architecture]
+        del _disassemblers_dict[architecture]
+    if disassembler is not None:
+        _disassemblers_dict[architecture] = disassembler
+
+    # Call the private _set_enabled function within the
+    # _gdb.disassembler module.  This function sets a global flag
+    # within GDB's C++ code that enables or dissables the Python
+    # disassembler functionality, this improves performance of the
+    # disassembler by avoiding unneeded calls into Python when we know
+    # that no disassemblers are registered.
+    _gdb.disassembler._set_enabled(len(_disassemblers_dict) > 0)
+    return old
+
+
+def _print_insn(info):
+    """This function is called by GDB when it wants to disassemble an
+    instruction.  INFO describes the instruction to be
+    disassembled."""
+
+    def lookup_disassembler(arch):
+        try:
+            name = arch.name()
+            if name is None:
+                return None
+            if name in _disassemblers_dict:
+                return _disassemblers_dict[name]
+            if None in _disassemblers_dict:
+                return _disassemblers_dict[None]
+            return None
+        except:
+            # It's pretty unlikely this exception case will ever
+            # trigger, one situation would be if the user somehow
+            # corrupted the _disassemblers_dict variable such that it
+            # was no longer a dictionary.
+            return None
+
+    disassembler = lookup_disassembler(info.architecture)
+    if disassembler is None:
+        return None
+    return disassembler(info)
+
+
+class maint_info_py_disassemblers_cmd(gdb.Command):
+    """
+    List all registered Python disassemblers.
+
+    List the name of all registered Python disassemblers, next to the
+    name of the architecture for which the disassembler is registered.
+
+    The global Python disassembler is listed next to the string
+    'GLOBAL'.
+
+    The disassembler that matches the architecture of the currently
+    selected inferior will be marked, this is an indication of which
+    disassembler will be invoked if any disassembly is performed in
+    the current inferior.
+    """
+
+    def __init__(self):
+        super().__init__("maintenance info python-disassemblers", gdb.COMMAND_USER)
+
+    def invoke(self, args, from_tty):
+        # If no disassemblers are registered, tell the user.
+        if len(_disassemblers_dict) == 0:
+            print("No Python disassemblers registered.")
+            return
+
+        # Figure out the longest architecture name, so we can
+        # correctly format the table of results.
+        longest_arch_name = 0
+        for architecture in _disassemblers_dict:
+            if architecture is not None:
+                name = _disassemblers_dict[architecture].name
+                if len(name) > longest_arch_name:
+                    longest_arch_name = len(name)
+
+        # Figure out the name of the current architecture.  There
+        # should always be a current inferior, but if, somehow, there
+        # isn't, then leave curr_arch as the empty string, which will
+        # not then match agaisnt any architecture in the dictionary.
+        curr_arch = ""
+        if gdb.selected_inferior() is not None:
+            curr_arch = gdb.selected_inferior().architecture().name()
+
+        # Now print the dictionary of registered disassemblers out to
+        # the user.
+        match_tag = "\t(Matches current architecture)"
+        fmt_len = max(longest_arch_name, len("Architecture"))
+        format_string = "{:" + str(fmt_len) + "s} {:s}"
+        print(format_string.format("Architecture", "Disassember Name"))
+        for architecture in _disassemblers_dict:
+            if architecture is not None:
+                name = _disassemblers_dict[architecture].name
+                if architecture == curr_arch:
+                    name += match_tag
+                    match_tag = ""
+                print(format_string.format(architecture, name))
+        if None in _disassemblers_dict:
+            name = _disassemblers_dict[None].name + match_tag
+            print(format_string.format("GLOBAL", name))
+
+
+maint_info_py_disassemblers_cmd()
diff --git a/gdb/python/py-disasm.c b/gdb/python/py-disasm.c
new file mode 100644
index 00000000000..4c78ca350c2
--- /dev/null
+++ b/gdb/python/py-disasm.c
@@ -0,0 +1,1090 @@
+/* Python interface to instruction disassembly.
+
+   Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "defs.h"
+#include "python-internal.h"
+#include "dis-asm.h"
+#include "arch-utils.h"
+#include "charset.h"
+#include "disasm.h"
+#include "progspace.h"
+
+/* Implement gdb.disassembler.DisassembleInfo type.  An object of this type
+   represents a single disassembler request from GDB.  */
+
+struct disasm_info_object
+{
+  PyObject_HEAD
+
+  /* The architecture in which we are disassembling.  */
+  struct gdbarch *gdbarch;
+
+  /* The program_space in which we are disassembling.  */
+  struct program_space *program_space;
+
+  /* Address of the instruction to disassemble.  */
+  bfd_vma address;
+
+  /* The disassemble_info passed from core GDB, this contains the
+     callbacks necessary to read the instruction from core GDB, and to
+     print the disassembled instruction.  */
+  disassemble_info *gdb_info;
+
+  /* If copies of this object are created then they are chained together
+     via this NEXT pointer, this allows all the copies to be invalidated at
+     the same time as the parent object.  */
+  struct disasm_info_object *next;
+};
+
+extern PyTypeObject disasm_info_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_info_object");
+
+/* Implement gdb.disassembler.DisassemblerResult type, an object that holds
+   the result of calling the disassembler.  This is mostly the length of
+   the disassembled instruction (in bytes), and the string representing the
+   disassembled instruction.  */
+
+struct disasm_result_object
+{
+  PyObject_HEAD
+
+  /* The length of the disassembled instruction in bytes.  */
+  int length;
+
+  /* A buffer which, when allocated, holds the disassembled content of an
+     instruction.  */
+  string_file *content;
+};
+
+extern PyTypeObject disasm_result_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_result_object");
+
+/* When this is false we fast path out of gdbpy_print_insn, which should
+   keep the performance impact of the Python disassembler down.  This is
+   set to true from Python by calling gdb.disassembler._set_enabled() when
+   the user registers a disassembler.  */
+
+static bool python_print_insn_enabled = false;
+
+/* A sub-class of gdb_disassembler that holds a pointer to a Python
+   DisassembleInfo object.  A pointer to an instance of this class is
+   placed in the application_data field of the disassemble_info that is
+   used when we call gdbarch_print_insn.  */
+
+struct gdbpy_disassembler : public gdb_printing_disassembler
+{
+  /* Constructor.  */
+  gdbpy_disassembler (disasm_info_object *obj, PyObject *memory_source);
+
+  /* Get the DisassembleInfo object pointer.  */
+  disasm_info_object *
+  py_disasm_info () const
+  {
+    return m_disasm_info_object;
+  }
+
+  /* Callbacks used by disassemble_info.  */
+  static void memory_error_func (int status, bfd_vma memaddr,
+				 struct disassemble_info *info);
+  static void print_address_func (bfd_vma addr,
+				  struct disassemble_info *info);
+  static int read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+			       unsigned int len,
+			       struct disassemble_info *info);
+
+  /* Return a reference to an optional that contains the address at which a
+     memory error occurred.  The optional will only have a value if a
+     memory error actually occurred.  */
+  const gdb::optional<CORE_ADDR> &memory_error_address () const
+  { return m_memory_error_address; }
+
+  /* Return the content of the disassembler as a string.  The contents are
+     moved out of the disassembler, so after this call the disassembler
+     contents have been reset back to empty.  */
+  std::string release ()
+  {
+    return m_string_file.release ();
+  }
+
+private:
+
+  /* Where the disassembler result is written.  */
+  string_file m_string_file;
+
+  /* The DisassembleInfo object we are disassembling for.  */
+  disasm_info_object *m_disasm_info_object;
+
+  /* When the user indicates that a memory error has occurred then the
+     address of the memory error is stored in here.  */
+  gdb::optional<CORE_ADDR> m_memory_error_address;
+
+  /* When the user calls the builtin_disassemble function, if they pass a
+     memory source object then a pointer to the object is placed in here,
+     otherwise, this field is nullptr.  */
+  PyObject *m_memory_source;
+};
+
+/* Return true if OBJ is still valid, otherwise, return false.  A valid OBJ
+   will have a non-nullptr gdb_info field.  */
+
+static bool
+disasm_info_object_is_valid (disasm_info_object *obj)
+{
+  return obj->gdb_info != nullptr;
+}
+
+/* Fill in OBJ with all the other arguments.  */
+
+static void
+disasm_info_fill (disasm_info_object *obj, struct gdbarch *gdbarch,
+		  program_space *progspace, bfd_vma address,
+		  disassemble_info *di, disasm_info_object *next)
+{
+  obj->gdbarch = gdbarch;
+  obj->program_space = progspace;
+  obj->address = address;
+  obj->gdb_info = di;
+  obj->next = next;
+}
+
+/* Implement DisassembleInfo.__init__.  Takes a single argument that must
+   be another DisassembleInfo object and copies the contents from the
+   argument into this new object.  */
+
+static int
+disasm_info_init (PyObject *self, PyObject *args, PyObject *kwargs)
+{
+  static const char *keywords[] = { "info", NULL };
+  PyObject *info_obj;
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kwargs, "O!", keywords,
+					&disasm_info_object_type,
+					&info_obj))
+    return -1;
+
+  disasm_info_object *other = (disasm_info_object *) info_obj;
+  disasm_info_object *info = (disasm_info_object *) self;
+  disasm_info_fill (info, other->gdbarch, other->program_space,
+		    other->address, other->gdb_info, other->next);
+  other->next = info;
+
+  /* As the OTHER object now holds a pointer to INFO we inc the ref count
+     on INFO.  This stops INFO being deleted until OTHER has gone away.  */
+  Py_INCREF ((PyObject *) info);
+  return 0;
+}
+
+/* The tp_dealloc callback for the DisassembleInfo type.  */
+
+static void
+disasm_info_dealloc (PyObject *self)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+
+  /* We no longer care about the object our NEXT pointer points at, so we
+     can decrement its reference count.  This macro handles the case when
+     NEXT is nullptr.  */
+  Py_XDECREF ((PyObject *) obj->next);
+
+  /* Now core deallocation behaviour.  */
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* Implement DisassembleInfo.is_valid(), really just a wrapper around the
+   disasm_info_object_is_valid function above.  */
+
+static PyObject *
+disasmpy_info_is_valid (PyObject *self, PyObject *args)
+{
+  disasm_info_object *disasm_obj = (disasm_info_object *) self;
+
+  if (disasm_info_object_is_valid (disasm_obj))
+    Py_RETURN_TRUE;
+
+  Py_RETURN_FALSE;
+}
+
+/* Set the Python exception to be a gdb.MemoryError object, with ADDRESS
+   as its payload.  */
+
+static void
+disasmpy_set_memory_error_for_address (CORE_ADDR address)
+{
+  PyObject *address_obj = gdb_py_object_from_longest (address).release ();
+  PyErr_SetObject (gdbpy_gdb_memory_error, address_obj);
+}
+
+/* Ensure that a gdb.disassembler.DisassembleInfo is valid.  */
+
+#define DISASMPY_DISASM_INFO_REQUIRE_VALID(Info)			\
+  do {									\
+    if (!disasm_info_object_is_valid (Info))				\
+      {									\
+	PyErr_SetString (PyExc_RuntimeError,				\
+			 _("DisassembleInfo is no longer valid."));	\
+	return nullptr;							\
+      }									\
+  } while (0)
+
+/* Initialise OBJ, a DisassemblerResult object with LENGTH and CONTENT.
+   OBJ might already have been initialised, in which case any existing
+   content should be discarded before the new CONTENT is moved in.  */
+
+static void
+disasmpy_init_disassembler_result (disasm_result_object *obj, int length,
+				   std::string content)
+{
+  if (obj->content == nullptr)
+    obj->content = new string_file;
+  else
+    obj->content->clear ();
+
+  obj->length = length;
+  *(obj->content) = std::move (content);
+}
+
+/* Implement gdb.disassembler.builtin_disassemble().  Calls back into GDB's
+   builtin disassembler.  The first argument is a DisassembleInfo object
+   describing what to disassemble.  The second argument is optional and
+   provides a mechanism to modify the memory contents that the builtin
+   disassembler will actually disassemble.
+
+   Returns an instance of gdb.disassembler.DisassemblerResult, an object
+   that wraps a disassembled instruction, or it raises a
+   gdb.MemoryError.  */
+
+static PyObject *
+disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *info_obj, *memory_source_obj = nullptr;
+  static const char *keywords[] = { "info", "memory_source", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
+					&disasm_info_object_type, &info_obj,
+					&memory_source_obj))
+    return nullptr;
+
+  disasm_info_object *disasm_info = (disasm_info_object *) info_obj;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (disasm_info);
+
+  /* Where the result will be written.  */
+  gdbpy_disassembler disassembler (disasm_info, memory_source_obj);
+
+  /* Now actually perform the disassembly.  LENGTH is set to the length of
+     the disassembled instruction, or -1 if there was a memory-error
+     encountered while disassembling.  See below more more details on
+     handling of -1 return value.  */
+  int length;
+  try
+    {
+      length = gdbarch_print_insn (disasm_info->gdbarch, disasm_info->address,
+				   disassembler.disasm_info ());
+    }
+  catch (gdbpy_err_fetch &pyerr)
+    {
+      /* Reinstall the Python exception held in PYERR.  This clears to
+	 pointers held in PYERR, hence the need to catch as a non-const
+	 reference.  */
+      pyerr.restore ();
+      return nullptr;
+    }
+
+  if (length == -1)
+    {
+
+      /* In an ideal world, every disassembler should always call the
+	 memory error function before returning a status of -1 as the only
+	 error a disassembler should encounter is a failure to read
+	 memory.  Unfortunately, there are some disassemblers who don't
+	 follow this rule, and will return -1 without calling the memory
+	 error function.
+
+	 To make the Python API simpler, we just classify everything as a
+	 memory error, but the message has to be modified for the case
+	 where the disassembler didn't call the memory error function.  */
+      if (disassembler.memory_error_address ().has_value ())
+	{
+	  CORE_ADDR addr = *disassembler.memory_error_address ();
+	  disasmpy_set_memory_error_for_address (addr);
+	}
+      else
+	{
+	  std::string content = disassembler.release ();
+	  if (!content.empty ())
+	    PyErr_SetString (gdbpy_gdberror_exc, content.c_str ());
+	  else
+	    PyErr_SetString (gdbpy_gdberror_exc,
+			     _("Unknown disassembly error."));
+	}
+      return nullptr;
+    }
+
+  /* Instructions are either non-zero in length, or we got an error,
+     indicated by a length of -1, which we handled above.  */
+  gdb_assert (length > 0);
+
+  /* We should not have seen a memory error in this case.  */
+  gdb_assert (!disassembler.memory_error_address ().has_value ());
+
+  /* Create a DisassemblerResult containing the results.  */
+  std::string content = disassembler.release ();
+  PyTypeObject *type = &disasm_result_object_type;
+  gdbpy_ref<disasm_result_object> res
+    ((disasm_result_object *) type->tp_alloc (type, 0));
+  disasmpy_init_disassembler_result (res.get (), length, std::move (content));
+  return reinterpret_cast<PyObject *> (res.release ());
+}
+
+/* Implement gdb._set_enabled function.  Takes a boolean parameter, and
+   sets whether GDB should enter the Python disassembler code or not.
+
+   This is called from within the Python code when a new disassembler is
+   registered.  When no disassemblers are registered the global C++ flag
+   is set to false, and GDB never even enters the Python environment to
+   check for a disassembler.
+
+   When the user registers a new Python disassembler, the global C++ flag
+   is set to true, and now GDB will enter the Python environment to check
+   if there's a disassembler registered for the current architecture.  */
+
+static PyObject *
+disasmpy_set_enabled (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *newstate;
+  static const char *keywords[] = { "state", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O", keywords,
+					&newstate))
+    return nullptr;
+
+  if (!PyBool_Check (newstate))
+    {
+      PyErr_SetString (PyExc_TypeError,
+		       _("The value passed to `_set_enabled' must be a boolean."));
+      return nullptr;
+    }
+
+  python_print_insn_enabled = PyObject_IsTrue (newstate);
+  Py_RETURN_NONE;
+}
+
+/* Implement DisassembleInfo.read_memory(LENGTH, OFFSET).  Read LENGTH
+   bytes at OFFSET from the start of the instruction currently being
+   disassembled, and return a memory buffer containing the bytes.
+
+   OFFSET defaults to zero if it is not provided.  LENGTH is required.  If
+   the read fails then this will raise a gdb.MemoryError exception.  */
+
+static PyObject *
+disasmpy_info_read_memory (PyObject *self, PyObject *args, PyObject *kw)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+
+  LONGEST length, offset = 0;
+  gdb::unique_xmalloc_ptr<gdb_byte> buffer;
+  static const char *keywords[] = { "length", "offset", nullptr };
+
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "L|L", keywords,
+					&length, &offset))
+    return nullptr;
+
+  /* The apparent address from which we are reading memory.  Note that in
+     some cases GDB actually disassembles instructions from a buffer, so
+     we might not actually be reading this information directly from the
+     inferior memory.  This is all hidden behind the read_memory_func API
+     within the disassemble_info structure.  */
+  CORE_ADDR address = obj->address + offset;
+
+  /* Setup a buffer to hold the result.  */
+  buffer.reset ((gdb_byte *) xmalloc (length));
+
+  /* Read content into BUFFER.  If the read fails then raise a memory
+     error, otherwise, convert BUFFER to a Python memory buffer, and return
+     it to the user.  */
+  disassemble_info *info = obj->gdb_info;
+  if (info->read_memory_func ((bfd_vma) address, buffer.get (),
+			      (unsigned int) length, info) != 0)
+    {
+      disasmpy_set_memory_error_for_address (address);
+      return nullptr;
+    }
+  return gdbpy_buffer_to_membuf (std::move (buffer), address, length);
+}
+
+/* Implement DisassembleInfo.address attribute, return the address at which
+   GDB would like an instruction disassembled.  */
+
+static PyObject *
+disasmpy_info_address (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdb_py_object_from_longest (obj->address).release ();
+}
+
+/* Implement DisassembleInfo.architecture attribute.  Return the
+   gdb.Architecture in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_architecture (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdbarch_to_arch_object (obj->gdbarch);
+}
+
+/* Implement DisassembleInfo.progspace attribute.  Return the
+   gdb.Progspace in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_progspace (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return pspace_to_pspace_object (obj->program_space).release ();
+}
+
+/* This implements the disassemble_info read_memory_func callback and is
+   called from the libopcodes disassembler when the disassembler wants to
+   read memory.
+
+   From the INFO argument we can find the gdbpy_disassembler object for
+   which we are disassembling, and from that object we can find the
+   DisassembleInfo for the current disassembly call.
+
+   This function reads the instruction bytes by calling the read_memory
+   method on the DisassembleInfo object.  This method might have been
+   overridden by user code.
+
+   Read LEN bytes from MEMADDR and place them into BUFF.  Return 0 on
+   success (in which case BUFF has been filled), or -1 on error, in which
+   case the contents of BUFF are undefined.  */
+
+int
+gdbpy_disassembler::read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+				      unsigned int len,
+				      struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  disasm_info_object *obj = dis->py_disasm_info ();
+
+  /* The DisassembleInfo.read_memory method expects an offset from the
+     address stored within the DisassembleInfo object; calculate that
+     offset here.  */
+  LONGEST offset = (LONGEST) memaddr - (LONGEST) obj->address;
+
+  /* Now call the DisassembleInfo.read_memory method.  This might have been
+     overridden by the user.  */
+  gdbpy_ref<> result_obj (PyObject_CallMethod ((PyObject *) obj,
+					       "read_memory",
+					       "KL", len, offset));
+
+  /* Handle any exceptions.  */
+  if (result_obj == nullptr)
+    {
+      /* If we got a gdb.MemoryError then we ignore this and just report
+	 that the read failed to the caller.  The caller is then
+	 responsible for calling the memory_error_func if it wants to.
+	 Remember, the disassembler might just be probing to see if these
+	 bytes can be read, if we automatically call the memory error
+	 function, we can end up registering an error prematurely.  */
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	{
+	  PyErr_Clear ();
+	  return -1;
+	}
+
+      /* For any other exception type we capture the value of the Python
+	 exception and throw it, this will then be caught in
+	 disasmpy_builtin_disassemble, at which point the exception will be
+	 restored.  */
+      throw gdbpy_err_fetch ();
+    }
+
+  /* Convert the result to a buffer.  */
+  Py_buffer py_buff;
+  if (!PyObject_CheckBuffer (result_obj.get ())
+      || PyObject_GetBuffer (result_obj.get(), &py_buff, PyBUF_CONTIG_RO) < 0)
+    {
+      PyErr_Format (PyExc_TypeError,
+		    _("Result from read_memory is not a buffer"));
+      throw gdbpy_err_fetch ();
+    }
+
+  /* Wrap PY_BUFF so that it is cleaned up correctly at the end of this
+     scope.  */
+  Py_buffer_up buffer_up (&py_buff);
+
+  /* Validate that the buffer is the correct length.  */
+  if (py_buff.len != len)
+    {
+      PyErr_Format (PyExc_ValueError,
+		    _("Buffer returned from read_memory is sized %d instead of the expected %d"),
+		    py_buff.len, len);
+      throw gdbpy_err_fetch ();
+    }
+
+  /* Copy the data out of the Python buffer and return success.  */
+  const gdb_byte *buffer = (const gdb_byte *) py_buff.buf;
+  memcpy (buff, buffer, len);
+  return 0;
+}
+
+/* Implement DisassemblerResult.length attribute, return the length of the
+   disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_length (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  return gdb_py_object_from_longest (obj->length).release ();
+}
+
+/* Implement DisassemblerResult.string attribute, return the content string
+   of the disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_string (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+
+  gdb_assert (obj->content != nullptr);
+  gdb_assert (strlen (obj->content->c_str ()) > 0);
+  gdb_assert (obj->length > 0);
+  return PyUnicode_Decode (obj->content->c_str (),
+			   obj->content->size (),
+			   host_charset (), nullptr);
+}
+
+/* Implement DisassemblerResult.__init__.  Takes two arguments, an
+   integer, the length in bytes of the disassembled instruction, and a
+   string, the disassembled content of the instruction.  */
+
+static int
+disasmpy_result_init (PyObject *self, PyObject *args, PyObject *kwargs)
+{
+  static const char *keywords[] = { "length", "string", NULL };
+  int length;
+  const char *string;
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kwargs, "is", keywords,
+					&length, &string))
+    return -1;
+
+  if (length <= 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("Length must be greater than 0."));
+      return -1;
+    }
+
+  if (strlen (string) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("String must not be empty."));
+      return -1;
+    }
+
+  disasm_result_object *obj = (disasm_result_object *) self;
+  disasmpy_init_disassembler_result (obj, length, std::string (string));
+
+  return 0;
+}
+
+/* Implement memory_error_func callback for disassemble_info.  Extract the
+   underlying DisassembleInfo Python object, and set a memory error on
+   it.  */
+
+void
+gdbpy_disassembler::memory_error_func (int status, bfd_vma memaddr,
+				       struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  dis->m_memory_error_address.emplace (memaddr);
+}
+
+/* Wrapper of print_address.  */
+
+void
+gdbpy_disassembler::print_address_func (bfd_vma addr,
+					struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  print_address (dis->arch (), addr, (struct ui_file *) info->stream);
+}
+
+/* constructor.  */
+
+gdbpy_disassembler::gdbpy_disassembler (disasm_info_object *obj,
+					PyObject *memory_source)
+  : gdb_printing_disassembler (obj->gdbarch, &m_string_file,
+			       read_memory_func, memory_error_func,
+			       print_address_func),
+    m_disasm_info_object (obj),
+    m_memory_source (memory_source)
+{ /* Nothing.  */ }
+
+/* A wrapper around a reference to a Python DisassembleInfo object, which
+   ensures that the object is marked as invalid when we leave the enclosing
+   scope.
+
+   Each DisassembleInfo is created in gdbpy_print_insn, and is done with by
+   the time that function returns.  However, there's nothing to stop a user
+   caching a reference to the DisassembleInfo, and thus keeping the object
+   around.
+
+   We therefore have the notion of a DisassembleInfo becoming invalid, this
+   happens when gdbpy_print_insn returns.  This class is responsible for
+   marking the DisassembleInfo as invalid in its destructor.  */
+
+struct scoped_disasm_info_object
+{
+  /* Constructor.  */
+  scoped_disasm_info_object (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+			     disassemble_info *info)
+    : m_disasm_info (allocate_disasm_info_object ())
+  {
+    disasm_info_fill (m_disasm_info.get (), gdbarch, current_program_space,
+		      memaddr, info, nullptr);
+  }
+
+  /* Upon destruction mark m_diasm_info as invalid.  */
+  ~scoped_disasm_info_object ()
+  {
+    /* Invalidate the original DisassembleInfo object as well as any copies
+       that the user might have made.  */
+    for (disasm_info_object *obj = m_disasm_info.get ();
+	 obj != nullptr;
+	 obj = obj->next)
+      obj->gdb_info = nullptr;
+  }
+
+  /* Return a pointer to the underlying disasm_info_object instance.  */
+  disasm_info_object *
+  get () const
+  {
+    return m_disasm_info.get ();
+  }
+
+private:
+
+  /* Wrapper around the call to PyObject_New, this wrapper function can be
+     called from the constructor initialization list, while PyObject_New, a
+     macro, can't.  */
+  static disasm_info_object *
+  allocate_disasm_info_object ()
+  {
+    return (disasm_info_object *) PyObject_New (disasm_info_object,
+						&disasm_info_object_type);
+  }
+
+  /* A reference to a gdb.disassembler.DisassembleInfo object.  When this
+     containing instance goes out of scope this reference is released,
+     however, the user might be holding other references to the
+     DisassembleInfo object in Python code, so the underlying object might
+     not be deleted.  */
+  gdbpy_ref<disasm_info_object> m_disasm_info;
+};
+
+/* See python-internal.h.  */
+
+gdb::optional<int>
+gdbpy_print_insn (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+		  disassemble_info *info)
+{
+  /* Early exit case.  This must be done as early as possible, and
+     definitely before we enter Python environment.  The
+     python_print_insn_enabled flag is set (from Python) only when the user
+     has installed one (or more) Python disassemblers.  So in the common
+     case (no custom disassembler installed) this flag will be false,
+     allowing for a quick return.  */
+  if (!gdb_python_initialized || !python_print_insn_enabled)
+    return {};
+
+  gdbpy_enter enter_py (get_current_arch (), current_language);
+
+  /* Import the gdb.disassembler module.  */
+  gdbpy_ref<> gdb_python_disassembler_module
+    (PyImport_ImportModule ("gdb.disassembler"));
+  if (gdb_python_disassembler_module == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Get the _print_insn attribute from the module, this should be the
+     function we are going to call to actually perform the disassembly.  */
+  gdbpy_ref<> hook
+    (PyObject_GetAttrString (gdb_python_disassembler_module.get (),
+			     "_print_insn"));
+  if (hook == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Create the new DisassembleInfo object we will pass into Python.  This
+     object will be marked as invalid when we leave this scope.  */
+  scoped_disasm_info_object scoped_disasm_info (gdbarch, memaddr, info);
+  disasm_info_object *disasm_info = scoped_disasm_info.get ();
+
+  /* Call into the registered disassembler to (possibly) perform the
+     disassembly.  */
+  PyObject *insn_disas_obj = (PyObject *) disasm_info;
+  gdbpy_ref<> result (PyObject_CallFunctionObjArgs (hook.get (),
+						    insn_disas_obj,
+						    nullptr));
+
+  if (result == nullptr)
+    {
+      /* The call into Python code resulted in an exception.  If this was a
+	 gdb.MemoryError, then we can figure out an address and call the
+	 disassemble_info::memory_error_func to report the error back to
+	 core GDB.  Any other exception type we report back to core GDB as
+	 an unknown error (return -1 without first calling the
+	 memory_error_func callback).  */
+
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	{
+	  /* A gdb.MemoryError might have an address attribute which
+	     contains the address at which the memory error occurred.  If
+	     this is the case then use this address, otherwise, fallback to
+	     just using the address of the instruction we were asked to
+	     disassemble.  */
+	  gdbpy_err_fetch err;
+	  PyErr_Clear ();
+
+	  CORE_ADDR addr;
+	  if (err.value () != nullptr
+	      && PyObject_HasAttrString (err.value ().get (), "address"))
+	    {
+	      PyObject *addr_obj
+		= PyObject_GetAttrString (err.value ().get (), "address");
+	      if (get_addr_from_python (addr_obj, &addr) < 0)
+		addr = disasm_info->address;
+	    }
+	  else
+	    addr = disasm_info->address;
+
+	  info->memory_error_func (-1, addr, info);
+	  return gdb::optional<int> (-1);
+	}
+      else if (PyErr_ExceptionMatches (gdbpy_gdberror_exc))
+	{
+	  gdbpy_err_fetch err;
+	  gdb::unique_xmalloc_ptr<char> msg = err.to_string ();
+
+	  info->fprintf_func (info->stream, "%s", msg.get ());
+	  return gdb::optional<int> (-1);
+	}
+      else
+	{
+	  gdbpy_print_stack ();
+	  return gdb::optional<int> (-1);
+	}
+
+    }
+  else if (result == Py_None)
+    {
+      /* A return value of None indicates that the Python code could not,
+	 or doesn't want to, disassemble this instruction.  Just return an
+	 empty result and core GDB will try to disassemble this for us.  */
+      return {};
+    }
+
+  /* Check the result is a DisassemblerResult (or a sub-class).  */
+  if (!PyObject_IsInstance (result.get (),
+			    (PyObject *) &disasm_result_object_type))
+    {
+      PyErr_SetString (PyExc_TypeError,
+		       _("Result is not a DisassemblerResult."));
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+
+  /* The call into Python neither raised an exception, or returned None.
+     Check to see if the result looks valid.  */
+  gdbpy_ref<> length_obj (PyObject_GetAttrString (result.get (), "length"));
+  if (length_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+
+  gdbpy_ref<> string_obj (PyObject_GetAttrString (result.get (), "string"));
+  if (string_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+  if (!gdbpy_is_string (string_obj.get ()))
+    {
+      PyErr_SetString (PyExc_TypeError, _("String attribute is not a string."));
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+
+  gdb::unique_xmalloc_ptr<char> string
+    = gdbpy_obj_to_string (string_obj.get ());
+  if (string == nullptr)
+    {
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+
+  long length;
+  if (!gdb_py_int_as_long (length_obj.get (), &length))
+    {
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+
+  long max_insn_length = (gdbarch_max_insn_length_p (gdbarch) ?
+			  gdbarch_max_insn_length (gdbarch) : INT_MAX);
+  if (length <= 0)
+    {
+      PyErr_SetString
+	(PyExc_ValueError,
+	 _("Invalid length attribute: length must be greater than 0."));
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+  if (length > max_insn_length)
+    {
+      PyErr_Format
+	(PyExc_ValueError,
+	 _("Invalid length attribute: length %d greater than architecture maximum of %d"),
+	 length, max_insn_length);
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+
+  if (strlen (string.get ()) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("String attribute must not be empty."));
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+
+  /* Print the disassembled instruction back to core GDB, and return the
+     length of the disassembled instruction.  */
+  info->fprintf_func (info->stream, "%s", string.get ());
+  return gdb::optional<int> (length);
+}
+
+/* The tp_dealloc callback for the DisassemblerResult type.  Takes care of
+   deallocating the content buffer.  */
+
+static void
+disasmpy_dealloc_result (PyObject *self)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  delete obj->content;
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* The get/set attributes of the gdb.disassembler.DisassembleInfo type.  */
+
+static gdb_PyGetSetDef disasm_info_object_getset[] = {
+  { "address", disasmpy_info_address, nullptr,
+    "Start address of the instruction to disassemble.", nullptr },
+  { "architecture", disasmpy_info_architecture, nullptr,
+    "Architecture to disassemble in", nullptr },
+  { "progspace", disasmpy_info_progspace, nullptr,
+    "Program space to disassemble in", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* The methods of the gdb.disassembler.DisassembleInfo type.  */
+
+static PyMethodDef disasm_info_object_methods[] = {
+  { "read_memory", (PyCFunction) disasmpy_info_read_memory,
+    METH_VARARGS | METH_KEYWORDS,
+    "read_memory (LEN, OFFSET = 0) -> Octets[]\n\
+Read LEN octets for the instruction to disassemble." },
+  { "is_valid", disasmpy_info_is_valid, METH_NOARGS,
+    "is_valid () -> Boolean.\n\
+Return true if this DisassembleInfo is valid, false if not." },
+  {nullptr}  /* Sentinel */
+};
+
+/* The get/set attributes of the gdb.disassembler.DisassemblerResult type.  */
+
+static gdb_PyGetSetDef disasm_result_object_getset[] = {
+  { "length", disasmpy_result_length, nullptr,
+    "Length of the disassembled instruction.", nullptr },
+  { "string", disasmpy_result_string, nullptr,
+    "String representing the disassembled instruction.", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* These are the methods we add into the _gdb.disassembler module, which
+   are then imported into the gdb.disassembler module.  These are global
+   functions that support performing disassembly.  */
+
+PyMethodDef python_disassembler_methods[] =
+{
+  { "builtin_disassemble", (PyCFunction) disasmpy_builtin_disassemble,
+    METH_VARARGS | METH_KEYWORDS,
+    "builtin_disassemble (INFO, MEMORY_SOURCE = None) -> None\n\
+Disassemble using GDB's builtin disassembler.  INFO is an instance of\n\
+gdb.disassembler.DisassembleInfo.  The MEMORY_SOURCE, if not None, should\n\
+be an object with the read_memory method." },
+  { "_set_enabled", (PyCFunction) disasmpy_set_enabled,
+    METH_VARARGS | METH_KEYWORDS,
+    "_set_enabled (STATE) -> None\n\
+Set whether GDB should call into the Python _print_insn code or not." },
+  {nullptr, nullptr, 0, nullptr}
+};
+
+/* Structure to define the _gdb.disassembler module.  */
+
+static struct PyModuleDef python_disassembler_module_def =
+{
+  PyModuleDef_HEAD_INIT,
+  "_gdb.disassembler",
+  nullptr,
+  -1,
+  python_disassembler_methods,
+  nullptr,
+  nullptr,
+  nullptr,
+  nullptr
+};
+
+/* Called to initialize the Python structures in this file.  */
+
+int
+gdbpy_initialize_disasm ()
+{
+  /* Create the _gdb.disassembler module, and add it to the _gdb module.  */
+
+  PyObject *gdb_disassembler_module;
+  gdb_disassembler_module = PyModule_Create (&python_disassembler_module_def);
+  if (gdb_disassembler_module == nullptr)
+    return -1;
+  PyModule_AddObject(gdb_module, "disassembler", gdb_disassembler_module);
+
+  /* This is needed so that 'import _gdb.disassembler' will work.  */
+  PyObject *dict = PyImport_GetModuleDict ();
+  PyDict_SetItemString (dict, "_gdb.disassembler", gdb_disassembler_module);
+
+  disasm_info_object_type.tp_new = PyType_GenericNew;
+  if (PyType_Ready (&disasm_info_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassembleInfo",
+			      (PyObject *) &disasm_info_object_type) < 0)
+    return -1;
+
+  disasm_result_object_type.tp_new = PyType_GenericNew;
+  if (PyType_Ready (&disasm_result_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassemblerResult",
+			      (PyObject *) &disasm_result_object_type) < 0)
+    return -1;
+
+  return 0;
+}
+
+/* Describe the gdb.disassembler.DisassembleInfo type.  */
+
+PyTypeObject disasm_info_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassembleInfo",		/*tp_name*/
+  sizeof (disasm_info_object),			/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  disasm_info_dealloc,				/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,	/*tp_flags*/
+  "GDB instruction disassembler object",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  disasm_info_object_methods,			/* tp_methods */
+  0,						/* tp_members */
+  disasm_info_object_getset,			/* tp_getset */
+  0,						/* tp_base */
+  0,						/* tp_dict */
+  0,						/* tp_descr_get */
+  0,						/* tp_descr_set */
+  0,						/* tp_dictoffset */
+  disasm_info_init,				/* tp_init */
+  0,						/* tp_alloc */
+};
+
+/* Describe the gdb.disassembler.DisassemblerResult type.  */
+
+PyTypeObject disasm_result_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassemblerResult",	/*tp_name*/
+  sizeof (disasm_result_object),		/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  disasmpy_dealloc_result,			/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,	/*tp_flags*/
+  "GDB object, representing a disassembler result",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  0,						/* tp_methods */
+  0,						/* tp_members */
+  disasm_result_object_getset,			/* tp_getset */
+  0,						/* tp_base */
+  0,						/* tp_dict */
+  0,						/* tp_descr_get */
+  0,						/* tp_descr_set */
+  0,						/* tp_dictoffset */
+  disasmpy_result_init,				/* tp_init */
+  0,						/* tp_alloc */
+};
diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
index 79219c6bb86..bae93de45b9 100644
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@@ -540,6 +540,8 @@ int gdbpy_initialize_connection ()
 int gdbpy_initialize_micommands (void)
   CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 void gdbpy_finalize_micommands ();
+int gdbpy_initialize_disasm ()
+  CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 
 /* A wrapper for PyErr_Fetch that handles reference counting for the
    caller.  */
@@ -587,6 +589,13 @@ class gdbpy_err_fetch
     return PyErr_GivenExceptionMatches (m_error_type.get (), type);
   }
 
+  /* Return a new reference to the exception value object.  */
+
+  gdbpy_ref<> value ()
+  {
+    return m_error_value;
+  }
+
 private:
 
   gdbpy_ref<> m_error_type, m_error_value, m_error_traceback;
@@ -819,4 +828,18 @@ extern bool gdbpy_is_architecture (PyObject *obj);
 
 extern bool gdbpy_is_progspace (PyObject *obj);
 
+/* Implement the 'print_insn' hook for Python.  Disassemble an instruction
+   whose address is ADDRESS for architecture GDBARCH.  The bytes of the
+   instruction should be read with INFO->read_memory_func as the
+   instruction being disassembled might actually be in a buffer.
+
+   Used INFO->fprintf_func to print the results of the disassembly, and
+   return the length of the instruction in octets.
+
+   If no instruction can be disassembled then return an empty value.  */
+
+extern gdb::optional<int> gdbpy_print_insn (struct gdbarch *gdbarch,
+					    CORE_ADDR address,
+					    disassemble_info *info);
+
 #endif /* PYTHON_PYTHON_INTERNAL_H */
diff --git a/gdb/python/python.c b/gdb/python/python.c
index b5b8379e23c..084b3687fec 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -167,7 +167,7 @@ static const struct extension_language_ops python_extension_ops =
 
   gdbpy_colorize_disasm,
 
-  NULL, /* gdbpy_print_insn, */
+  gdbpy_print_insn,
 };
 
 #endif /* HAVE_PYTHON */
@@ -2045,6 +2045,7 @@ do_start_initialization ()
 
   if (gdbpy_initialize_auto_load () < 0
       || gdbpy_initialize_values () < 0
+      || gdbpy_initialize_disasm () < 0
       || gdbpy_initialize_frames () < 0
       || gdbpy_initialize_commands () < 0
       || gdbpy_initialize_instruction () < 0
diff --git a/gdb/testsuite/gdb.python/py-disasm.c b/gdb/testsuite/gdb.python/py-disasm.c
new file mode 100644
index 00000000000..ee0bb157f4d
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.c
@@ -0,0 +1,25 @@
+/* This test program is part of GDB, the GNU debugger.
+
+   Copyright 2021-2022 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+int
+main ()
+{
+  asm ("nop");
+  asm ("nop");	/* Break here.  */
+  asm ("nop");
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.exp b/gdb/testsuite/gdb.python/py-disasm.exp
new file mode 100644
index 00000000000..1b9cd4465ac
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.exp
@@ -0,0 +1,209 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This file is part of the GDB testsuite.  It validates the Python
+# disassembler API.
+
+load_lib gdb-python.exp
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile} "debug"] } {
+    return -1
+}
+
+# Skip all tests if Python scripting is not enabled.
+if { [skip_python_tests] } { continue }
+
+if ![runto_main] then {
+    fail "can't run to main"
+    return 0
+}
+
+set pyfile [gdb_remote_download host ${srcdir}/${subdir}/${testfile}.py]
+
+gdb_test "source ${pyfile}" "Python script imported" \
+         "import python scripts"
+
+gdb_breakpoint [gdb_get_line_number "Break here."]
+gdb_continue_to_breakpoint "Break here."
+
+set curr_pc [get_valueof "/x" "\$pc" "*unknown*"]
+
+gdb_test_no_output "python current_pc = ${curr_pc}"
+
+# The current pc will be something like 0x1234 with no leading zeros.
+# However, in the disassembler output addresses are padded with zeros.
+# This substitution changes 0x1234 to 0x0*1234, which can then be used
+# as a regexp in the disassembler output matching.
+set curr_pc_pattern [string replace ${curr_pc} 0 1 "0x0*"]
+
+# Grab the name of the current architecture, this is used in the tests
+# patterns below.
+set curr_arch [get_python_valueof "gdb.selected_inferior().architecture().name()" "*unknown*"]
+
+# Helper proc that removes all registered disassemblers.
+proc py_remove_all_disassemblers {} {
+    gdb_test_no_output "python remove_all_python_disassemblers()"
+}
+
+# A list of test plans.  Each plan is a list of two elements, the
+# first element is the name of a class in py-disasm.py, this is a
+# disassembler class.  The second element is a pattern that should be
+# matched in the disassembler output.
+#
+# Each different disassembler tests some different feature of the
+# Python disassembler API.
+set unknown_error_pattern "unknown disassembler error \\(error = -1\\)"
+set addr_pattern "\r\n=> ${curr_pc_pattern} <\[^>\]+>:\\s+"
+set base_pattern "${addr_pattern}nop"
+set test_plans \
+    [list \
+	 [list "" "${base_pattern}\r\n.*"] \
+	 [list "GlobalNullDisassembler" "${base_pattern}\r\n.*"] \
+	 [list "GlobalPreInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalPostInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalReadDisassembler" "${base_pattern}\\s+## bytes =( $hex)+\r\n.*"] \
+	 [list "GlobalAddrDisassembler" "${base_pattern}\\s+## addr = ${curr_pc_pattern} <\[^>\]+>\r\n.*"] \
+	 [list "GdbErrorEarlyDisassembler" "${addr_pattern}GdbError instead of a result\r\n${unknown_error_pattern}"] \
+	 [list "RuntimeErrorEarlyDisassembler" "${addr_pattern}Python Exception <class 'RuntimeError'>: RuntimeError instead of a result\r\n\r\n${unknown_error_pattern}"] \
+	 [list "GdbErrorLateDisassembler" "${addr_pattern}GdbError after builtin disassembler\r\n${unknown_error_pattern}"] \
+	 [list "RuntimeErrorLateDisassembler" "${addr_pattern}Python Exception <class 'RuntimeError'>: RuntimeError after builtin disassembler\r\n\r\n${unknown_error_pattern}"] \
+	 [list "MemoryErrorEarlyDisassembler" "${base_pattern}\\s+## AFTER ERROR\r\n.*"] \
+	 [list "MemoryErrorLateDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "RethrowMemoryErrorDisassembler" "${addr_pattern}Cannot access memory at address $hex"] \
+	 [list "ReadMemoryMemoryErrorDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "ReadMemoryGdbErrorDisassembler" "${addr_pattern}read_memory raised GdbError\r\n${unknown_error_pattern}"] \
+	 [list "ReadMemoryRuntimeErrorDisassembler" "${addr_pattern}Python Exception <class 'RuntimeError'>: read_memory raised RuntimeError\r\n\r\n${unknown_error_pattern}"] \
+	 [list "ReadMemoryCaughtMemoryErrorDisassembler" "${addr_pattern}nop\r\n.*"] \
+	 [list "ReadMemoryCaughtGdbErrorDisassembler" "${addr_pattern}nop\r\n.*"] \
+	 [list "ReadMemoryCaughtRuntimeErrorDisassembler" "${addr_pattern}nop\r\n.*"] \
+	 [list "MemorySourceNotABufferDisassembler" "${addr_pattern}Python Exception <class 'TypeError'>: Result from read_memory is not a buffer\r\n\r\n${unknown_error_pattern}"] \
+	 [list "MemorySourceBufferTooLongDisassembler" "${addr_pattern}Python Exception <class 'ValueError'>: Buffer returned from read_memory is sized $decimal instead of the expected $decimal\r\n\r\n${unknown_error_pattern}"] \
+	 [list "ResultOfWrongType" "${addr_pattern}Python Exception <class 'TypeError'>: Result is not a DisassemblerResult.\r\n.*"] \
+	 [list "ResultWithInvalidLength" "${addr_pattern}Python Exception <class 'ValueError'>: Invalid length attribute: length must be greater than 0.\r\n.*"] \
+	 [list "ResultWithInvalidString" "${addr_pattern}Python Exception <class 'ValueError'>: String attribute must not be empty.\r\n.*"]]
+
+# Now execute each test plan.
+foreach plan $test_plans {
+    set global_disassembler_name [lindex $plan 0]
+    set expected_pattern [lindex $plan 1]
+
+    with_test_prefix "global_disassembler=${global_disassembler_name}" {
+	# Remove all existing disassemblers.
+	py_remove_all_disassemblers
+
+	# If we have a disassembler to load, do it now.
+	if { $global_disassembler_name != "" } {
+	    gdb_test_no_output "python add_global_disassembler($global_disassembler_name)"
+	}
+
+	# Disassemble main, and check the disassembler output.
+	gdb_test "disassemble main" $expected_pattern
+    }
+}
+
+# Check some errors relating to DisassemblerResult creation.
+with_test_prefix "DisassemblerResult errors" {
+    gdb_test "python gdb.disassembler.DisassemblerResult(0, 'abc')" \
+	[multi_line \
+	     "ValueError: Length must be greater than 0." \
+	     "Error while executing Python code."]
+    gdb_test "python gdb.disassembler.DisassemblerResult(-1, 'abc')" \
+	[multi_line \
+	     "ValueError: Length must be greater than 0." \
+	     "Error while executing Python code."]
+    gdb_test "python gdb.disassembler.DisassemblerResult(1, '')" \
+	[multi_line \
+	     "ValueError: String must not be empty." \
+	     "Error while executing Python code."]
+}
+
+# Check that the architecture specific disassemblers can override the
+# global disassembler.
+#
+# First, register a global disassembler, and check it is in place.
+with_test_prefix "GLOBAL tagging disassembler" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"GLOBAL\"), None)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Now register an architecture specific disassembler, and check it
+# overrides the global disassembler.
+with_test_prefix "LOCAL tagging disassembler" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"LOCAL\"), \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = LOCAL\r\n.*"
+}
+
+# Now remove the architecture specific disassembler, and check that
+# the global disassembler kicks back in.
+with_test_prefix "GLOBAL tagging disassembler again" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(None, \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Check that a DisassembleInfo becomes invalid after the call into the
+# disassembler.
+with_test_prefix "DisassembleInfo becomes invalid" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python add_global_disassembler(GlobalCachingDisassembler)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## CACHED\r\n.*"
+    gdb_test "python GlobalCachingDisassembler.check()" "PASS"
+}
+
+# Test the memory source aspect of the builtin disassembler.
+with_test_prefix "memory source api" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python analyzing_disassembler = add_global_disassembler(AnalyzingDisassembler)"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*"
+    gdb_test "python analyzing_disassembler.find_replacement_candidate()" \
+	"Replace from $hex to $hex with NOP"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*" \
+	"second disassembler pass"
+    gdb_test "python analyzing_disassembler.check()" \
+	"PASS"
+}
+
+# Test the 'maint info python-disassemblers command.
+with_test_prefix "maint info python-disassemblers" {
+    py_remove_all_disassemblers
+    gdb_test "maint info python-disassemblers" "No Python disassemblers registered\\." \
+	"list disassemblers, none registered"
+    gdb_test_no_output "python disasm = add_global_disassembler(BuiltinDisassembler)"
+    gdb_test "maint info python-disassemblers" \
+	[multi_line \
+	     "Architecture\\s+Disassember Name" \
+	     "GLOBAL\\s+BuiltinDisassembler\\s+\\(Matches current architecture\\)"] \
+	"list disassemblers, single global disassembler"
+    gdb_test_no_output "python arch = gdb.selected_inferior().architecture().name()"
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(disasm, arch)"
+    gdb_test "maint info python-disassemblers" \
+	[multi_line \
+	     "Architecture\\s+Disassember Name" \
+	     "\[^\r\n\]+BuiltinDisassembler\\s+\\(Matches current architecture\\)" \
+	     "GLOBAL\\s+BuiltinDisassembler"] \
+	"list disassemblers, multiple disassemblers registered"
+}
+
+# Check the attempt to create a "new" DisassembleInfo object fails.
+with_test_prefix "Bad DisassembleInfo creation" {
+    gdb_test_no_output "python my_info = InvalidDisassembleInfo()"
+    gdb_test "python print(my_info.is_valid())" "True"
+    gdb_test "python gdb.disassembler.builtin_disassemble(my_info)" \
+	[multi_line \
+	     "RuntimeError: DisassembleInfo is no longer valid\\." \
+	     "Error while executing Python code\\."]
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.py b/gdb/testsuite/gdb.python/py-disasm.py
new file mode 100644
index 00000000000..ff7ffdb97d9
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.py
@@ -0,0 +1,712 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import gdb
+import gdb.disassembler
+import struct
+import sys
+
+from gdb.disassembler import Disassembler, DisassemblerResult
+
+# A global, holds the program-counter address at which we should
+# perform the extra disassembly that this script provides.
+current_pc = None
+
+
+# Remove all currently registered disassemblers.
+def remove_all_python_disassemblers():
+    for a in gdb.architecture_names():
+        gdb.disassembler.register_disassembler(None, a)
+    gdb.disassembler.register_disassembler(None, None)
+
+
+class TestDisassembler(Disassembler):
+    """A base class for disassemblers within this script to inherit from.
+    Implements the __call__ method and ensures we only do any
+    disassembly wrapping for the global CURRENT_PC."""
+
+    def __init__(self):
+        global current_pc
+
+        super().__init__("TestDisassembler")
+        self.__info = None
+        if current_pc == None:
+            raise gdb.GdbError("no current_pc set")
+
+    def __call__(self, info):
+        global current_pc
+
+        if info.address != current_pc:
+            return None
+        self.__info = info
+        return self.disassemble(info)
+
+    def get_info(self):
+        return self.__info
+
+    def disassemble(self, info):
+        raise NotImplementedError("override the disassemble method")
+
+
+class GlobalPreInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo before disassembly has occurred."""
+
+    def disassemble(self, info):
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalPostInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo after disassembly has occurred."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalReadDisassembler(TestDisassembler):
+    """Check the DisassembleInfo.read_memory method.  Calls the builtin
+    disassembler, then reads all of the bytes of this instruction, and
+    adds them as a comment to the disassembler output."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        len = result.length
+        str = ""
+        for o in range(len):
+            if str != "":
+                str += " "
+            v = bytes(info.read_memory(1, o))[0]
+            if sys.version_info[0] < 3:
+                v = struct.unpack("<B", v)
+            str += "0x%02x" % v
+        text = result.string + "\t## bytes = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalAddrDisassembler(TestDisassembler):
+    """Check the gdb.format_address method."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        arch = info.architecture
+        addr = info.address
+        program_space = info.progspace
+        str = gdb.format_address(addr, program_space, arch)
+        text = result.string + "\t## addr = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class GdbErrorEarlyDisassembler(TestDisassembler):
+    """Raise a GdbError instead of performing any disassembly."""
+
+    def disassemble(self, info):
+        raise gdb.GdbError("GdbError instead of a result")
+
+
+class RuntimeErrorEarlyDisassembler(TestDisassembler):
+    """Raise a RuntimeError instead of performing any disassembly."""
+
+    def disassemble(self, info):
+        raise RuntimeError("RuntimeError instead of a result")
+
+
+class GdbErrorLateDisassembler(TestDisassembler):
+    """Raise a GdbError after calling the builtin disassembler."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        raise gdb.GdbError("GdbError after builtin disassembler")
+
+
+class RuntimeErrorLateDisassembler(TestDisassembler):
+    """Raise a RuntimeError after calling the builtin disassembler."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        raise RuntimeError("RuntimeError after builtin disassembler")
+
+
+class MemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a memory error, ignore the error and disassemble."""
+
+    def disassemble(self, info):
+        tag = "## FAIL"
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError:
+            tag = "## AFTER ERROR"
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t" + tag
+        return DisassemblerResult(result.length, text)
+
+
+class MemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a memory error after calling the builtin disassembler, but
+    before we return a result."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        # The following read will throw an error.
+        info.read_memory(1, -info.address + 2)
+        return DisassemblerResult(1, "BAD")
+
+
+class RethrowMemoryErrorDisassembler(TestDisassembler):
+    """Catch and rethrow a memory error."""
+
+    def disassemble(self, info):
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError as e:
+            raise gdb.MemoryError("cannot read code at address 0x2")
+        return DisassemblerResult(1, "BAD")
+
+
+class ResultOfWrongType(TestDisassembler):
+    """Return something that is not a DisassemblerResult from disassemble method"""
+
+    class Blah:
+        def __init__(self, length, string):
+            self.length = length
+            self.string = string
+
+    def disassemble(self, info):
+        return self.Blah(1, "ABC")
+
+
+class ResultWrapper(gdb.disassembler.DisassemblerResult):
+    def __init__(self, length, string, length_x=None, string_x=None):
+        super().__init__(length, string)
+        if length_x is None:
+            self.__length = length
+        else:
+            self.__length = length_x
+        if string_x is None:
+            self.__string = string
+        else:
+            self.__string = string_x
+
+    @property
+    def length(self):
+        return self.__length
+
+    @property
+    def string(self):
+        return self.__string
+
+
+class ResultWithInvalidLength(TestDisassembler):
+    """Return a result object with an invalid length."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        return ResultWrapper(result.length, result.string, 0)
+
+
+class ResultWithInvalidString(TestDisassembler):
+    """Return a result object with an empty string."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        return ResultWrapper(result.length, result.string, None, "")
+
+
+class TaggingDisassembler(TestDisassembler):
+    """A simple disassembler that just tags the output."""
+
+    def __init__(self, tag):
+        super().__init__()
+        self._tag = tag
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## tag = %s" % self._tag
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalCachingDisassembler(TestDisassembler):
+    """A disassembler that caches the DisassembleInfo that is passed in,
+    as well as a copy of the original DisassembleInfo.
+
+    Once the call into the disassembler is complete then the
+    DisassembleInfo objects become invalid, and any calls into them
+    should trigger an exception."""
+
+    # This is where we cache the DisassembleInfo objects.
+    cached_insn_disas = []
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+    def disassemble(self, info):
+        """Disassemble the instruction, add a CACHED comment to the output,
+        and cache the DisassembleInfo so that it is not garbage collected."""
+        GlobalCachingDisassembler.cached_insn_disas.append(info)
+        GlobalCachingDisassembler.cached_insn_disas.append(self.MyInfo(info))
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## CACHED"
+        return DisassemblerResult(result.length, text)
+
+    @staticmethod
+    def check():
+        """Check that all of the methods on the cached DisassembleInfo trigger an
+        exception."""
+        for info in GlobalCachingDisassembler.cached_insn_disas:
+            assert isinstance(info, gdb.disassembler.DisassembleInfo)
+            assert not info.is_valid()
+            try:
+                val = info.address
+                raise gdb.GdbError("DisassembleInfo.address is still valid")
+            except RuntimeError as e:
+                assert str(e) == "DisassembleInfo is no longer valid."
+            except:
+                raise gdb.GdbError(
+                    "DisassembleInfo.address raised an unexpected exception"
+                )
+
+            try:
+                val = info.architecture
+                raise gdb.GdbError("DisassembleInfo.architecture is still valid")
+            except RuntimeError as e:
+                assert str(e) == "DisassembleInfo is no longer valid."
+            except:
+                raise gdb.GdbError(
+                    "DisassembleInfo.architecture raised an unexpected exception"
+                )
+
+            try:
+                val = info.read_memory(1, 0)
+                raise gdb.GdbError("DisassembleInfo.read is still valid")
+            except RuntimeError as e:
+                assert str(e) == "DisassembleInfo is no longer valid."
+            except:
+                raise gdb.GdbError(
+                    "DisassembleInfo.read raised an unexpected exception"
+                )
+
+        print("PASS")
+
+
+class GlobalNullDisassembler(TestDisassembler):
+    """A disassembler that does not change the output at all."""
+
+    def disassemble(self, info):
+        pass
+
+
+class ReadMemoryMemoryErrorDisassembler(TestDisassembler):
+    """Raise a MemoryError exception from the DisassembleInfo.read_memory
+    method."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            # Throw a memory error with a specific address.  We don't
+            # expect this address to show up in the output though.
+            raise gdb.MemoryError(0x1234)
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class ReadMemoryGdbErrorDisassembler(TestDisassembler):
+    """Raise a GdbError exception from the DisassembleInfo.read_memory
+    method."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            raise gdb.GdbError("read_memory raised GdbError")
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class ReadMemoryRuntimeErrorDisassembler(TestDisassembler):
+    """Raise a RuntimeError exception from the DisassembleInfo.read_memory
+    method."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            raise RuntimeError("read_memory raised RuntimeError")
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class ReadMemoryCaughtMemoryErrorDisassembler(TestDisassembler):
+    """Raise a MemoryError exception from the DisassembleInfo.read_memory
+    method, catch this in the outer disassembler."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            raise gdb.MemoryError(0x1234)
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        try:
+            return gdb.disassembler.builtin_disassemble(info)
+        except gdb.MemoryError:
+            return None
+
+
+class ReadMemoryCaughtGdbErrorDisassembler(TestDisassembler):
+    """Raise a GdbError exception from the DisassembleInfo.read_memory
+    method, catch this in the outer disassembler."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            raise gdb.GdbError("exception message")
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        try:
+            return gdb.disassembler.builtin_disassemble(info)
+        except gdb.GdbError as e:
+            if e.args[0] == "exception message":
+                return None
+            raise e
+
+
+class ReadMemoryCaughtRuntimeErrorDisassembler(TestDisassembler):
+    """Raise a RuntimeError exception from the DisassembleInfo.read_memory
+    method, catch this in the outer disassembler."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            raise RuntimeError("exception message")
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        try:
+            return gdb.disassembler.builtin_disassemble(info)
+        except RuntimeError as e:
+            if e.args[0] == "exception message":
+                return None
+            raise e
+
+
+class MemorySourceNotABufferDisassembler(TestDisassembler):
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            return 1234
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class MemorySourceBufferTooLongDisassembler(TestDisassembler):
+    """The read memory returns too many bytes."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            buffer = super().read_memory(length, offset)
+            # Create a new memory view made by duplicating BUFFER.  This
+            # will trigger an error as GDB expects a buffer of exactly
+            # LENGTH to be returned, while this will return a buffer of
+            # 2*LENGTH.
+            return memoryview(
+                bytes([int.from_bytes(x, "little") for x in (list(buffer[0:]) * 2)])
+            )
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class BuiltinDisassembler(Disassembler):
+    """Just calls the builtin disassembler."""
+
+    def __init__(self):
+        super().__init__("BuiltinDisassembler")
+
+    def __call__(self, info):
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class AnalyzingDisassembler(Disassembler):
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        """Wrapper around builtin DisassembleInfo type that overrides the
+        read_memory method."""
+
+        def __init__(self, info, start, end, nop_bytes):
+            """INFO is the DisassembleInfo we are wrapping.  START and END are
+            addresses, and NOP_BYTES should be a memoryview object.
+
+            The length (END - START) should be the same as the length
+            of NOP_BYTES.
+
+            Any memory read requests outside the START->END range are
+            serviced normally, but any attempt to read within the
+            START->END range will return content from NOP_BYTES."""
+            super().__init__(info)
+            self._start = start
+            self._end = end
+            self._nop_bytes = nop_bytes
+
+        def _read_replacement(self, length, offset):
+            """Return a slice of the buffer representing the replacement nop
+            instructions."""
+
+            assert self._nop_bytes is not None
+            rb = self._nop_bytes
+
+            # If this request is outside of a nop instruction then we don't know
+            # what to do, so just raise a memory error.
+            if offset >= len(rb) or (offset + length) > len(rb):
+                raise gdb.MemoryError("invalid length and offset combination")
+
+            # Return only the slice of the nop instruction as requested.
+            s = offset
+            e = offset + length
+            return rb[s:e]
+
+        def read_memory(self, length, offset=0):
+            """Callback used by the builtin disassembler to read the contents of
+            memory."""
+
+            # If this request is within the region we are replacing with 'nop'
+            # instructions, then call the helper function to perform that
+            # replacement.
+            if self._start is not None:
+                assert self._end is not None
+                if self.address >= self._start and self.address < self._end:
+                    return self._read_replacement(length, offset)
+
+            # Otherwise, we just forward this request to the default read memory
+            # implementation.
+            return super().read_memory(length, offset)
+
+    def __init__(self):
+        """Constructor."""
+        super().__init__("AnalyzingDisassembler")
+
+        # Details about the instructions found during the first disassembler
+        # pass.
+        self._pass_1_length = []
+        self._pass_1_insn = []
+        self._pass_1_address = []
+
+        # The start and end address for the instruction we will replace with
+        # one or more 'nop' instructions during pass two.
+        self._start = None
+        self._end = None
+
+        # The index in the _pass_1_* lists for where the nop instruction can
+        # be found, also, the buffer of bytes that make up a nop instruction.
+        self._nop_index = None
+        self._nop_bytes = None
+
+        # A flag that indicates if we are in the first or second pass of
+        # this disassembler test.
+        self._first_pass = True
+
+        # The disassembled instructions collected during the second pass.
+        self._pass_2_insn = []
+
+        # A copy of _pass_1_insn that has been modified to include the extra
+        # 'nop' instructions we plan to insert during the second pass.  This
+        # is then checked against _pass_2_insn after the second disassembler
+        # pass has completed.
+        self._check = []
+
+    def __call__(self, info):
+        """Called to perform the disassembly."""
+
+        # Override the info object, this provides access to our
+        # read_memory function.
+        info = self.MyInfo(info, self._start, self._end, self._nop_bytes)
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        # Record some informaiton about the first 'nop' instruction we find.
+        if self._nop_index is None and result.string == "nop":
+            self._nop_index = len(self._pass_1_length)
+            # The offset in the following read_memory call defaults to 0.
+            print("APB: Reading nop bytes")
+            self._nop_bytes = info.read_memory(result.length)
+
+        # Record information about each instruction that is disassembled.
+        # This test is performed in two passes, and we need different
+        # information in each pass.
+        if self._first_pass:
+            self._pass_1_length.append(result.length)
+            self._pass_1_insn.append(result.string)
+            self._pass_1_address.append(info.address)
+        else:
+            self._pass_2_insn.append(result.string)
+
+        return result
+
+    def find_replacement_candidate(self):
+        """Call this after the first disassembly pass.  This identifies a suitable
+        instruction to replace with 'nop' instruction(s)."""
+
+        if self._nop_index is None:
+            raise gdb.GdbError("no nop was found")
+
+        nop_idx = self._nop_index
+        nop_length = self._pass_1_length[nop_idx]
+
+        # First we look for an instruction that is larger than a nop
+        # instruction, but whose length is an exact multiple of the nop
+        # instruction's length.
+        replace_idx = None
+        for idx in range(len(self._pass_1_length)):
+            if (
+                idx > 0
+                and idx != nop_idx
+                and self._pass_1_insn[idx] != "nop"
+                and self._pass_1_length[idx] > self._pass_1_length[nop_idx]
+                and self._pass_1_length[idx] % self._pass_1_length[nop_idx] == 0
+            ):
+                replace_idx = idx
+                break
+
+        # If we still don't have a replacement candidate, then search again,
+        # this time looking for an instruciton that is the same length as a
+        # nop instruction.
+        if replace_idx is None:
+            for idx in range(len(self._pass_1_length)):
+                if (
+                    idx > 0
+                    and idx != nop_idx
+                    and self._pass_1_insn[idx] != "nop"
+                    and self._pass_1_length[idx] == self._pass_1_length[nop_idx]
+                ):
+                    replace_idx = idx
+                    break
+
+        # Weird, the nop instruction must be larger than every other
+        # instruction, or all instructions are 'nop'?
+        if replace_idx is None:
+            raise gdb.GdbError("can't find an instruction to replace")
+
+        # Record the instruction range that will be replaced with 'nop'
+        # instructions, and mark that we are now on the second pass.
+        self._start = self._pass_1_address[replace_idx]
+        self._end = self._pass_1_address[replace_idx] + self._pass_1_length[replace_idx]
+        self._first_pass = False
+        print("Replace from 0x%x to 0x%x with NOP" % (self._start, self._end))
+
+        # Finally, build the expected result.  Create the _check list, which
+        # is a copy of _pass_1_insn, but replace the instruction we
+        # identified above with a series of 'nop' instructions.
+        self._check = list(self._pass_1_insn)
+        nop_count = int(self._pass_1_length[replace_idx] / self._pass_1_length[nop_idx])
+        nops = ["nop"] * nop_count
+        self._check[replace_idx : (replace_idx + 1)] = nops
+
+    def check(self):
+        """Call this after the second disassembler pass to validate the output."""
+        if self._check != self._pass_2_insn:
+            print("APB, Check : %s" % self._check)
+            print("APB, Result: %s" % self._pass_2_insn)
+            raise gdb.GdbError("mismatch")
+        print("PASS")
+
+
+def add_global_disassembler(dis_class):
+    """Create an instance of DIS_CLASS and register it as a global disassembler."""
+    dis = dis_class()
+    gdb.disassembler.register_disassembler(dis, None)
+    return dis
+
+
+class InvalidDisassembleInfo(gdb.disassembler.DisassembleInfo):
+    """An attempt to create a DisassembleInfo sub-class without calling
+    the parent class init method.
+
+    Attempts to use instances of this class should throw an error
+    saying that the DisassembleInfo is not valid, despite this class
+    having all of the required attributes.
+
+    The reason why this class will never be valid is that an internal
+    field (within the C++ code) can't be initialized without calling
+    the parent class init method."""
+
+    def __init__(self):
+        assert current_pc is not None
+
+    def is_valid(self):
+        return True
+
+    @property
+    def address(self):
+        global current_pc
+        return current_pc
+
+    @property
+    def architecture(self):
+        return gdb.selected_inferior().architecture()
+
+    @property
+    def progspace(self):
+        return gdb.selected_inferior().progspace
+
+
+# Start with all disassemblers removed.
+remove_all_python_disassemblers()
+
+print("Python script imported")
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv6 5/6] gdb: refactor the non-printing disassemblers
  2022-05-25 10:49         ` [PATCHv6 0/6] Add Python API for the disassembler Andrew Burgess
                             ` (3 preceding siblings ...)
  2022-05-25 10:49           ` [PATCHv6 4/6] gdb/python: implement the print_insn extension language hook Andrew Burgess
@ 2022-05-25 10:49           ` Andrew Burgess
  2022-05-25 10:49           ` [PATCHv6 6/6] gdb: unify two dis_asm_read_memory functions in disasm.c Andrew Burgess
  2022-06-15  9:04           ` [PUSHED 0/6] Add Python API for the disassembler Andrew Burgess
  6 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-05-25 10:49 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

This commit started from an observation I made while working on some
other disassembler patches, that is, that the function
gdb_buffered_insn_length, is broken ... sort of.

I noticed that the gdb_buffered_insn_length function doesn't set up
the application data field if the disassemble_info structure.

Further, I noticed that some architectures, for example, ARM, require
that the application_data field be set, see gdb_print_insn_arm in
arm-tdep.c.

And so, if we ever use gdb_buffered_insn_length for ARM, then GDB will
likely crash.  Which is why I said only "sort of" broken.  Right now
we don't use gdb_buffered_insn_length with ARM, so maybe it isn't
broken yet?

Anyway to prove to myself that there was a problem here I extended the
disassembler self tests in disasm-selftests.c to include a test of
gdb_buffered_insn_length.  As I run the test for all architectures, I
do indeed see GDB crash for ARM.

To fix this we need gdb_buffered_insn_length to create a disassembler
that inherits from gdb_disassemble_info, but we also need this new
disassembler to not print anything.

And so, I introduce a new gdb_non_printing_disassembler class, this is
a disassembler that doesn't print anything to the output stream.

I then observed that both ARC and S12Z also create non-printing
disassemblers, but these are slightly different.  While the
disassembler in gdb_non_printing_disassembler reads the instruction
from a buffer, the ARC and S12Z disassemblers read from target memory
using target_read_code.

And so, I further split gdb_non_printing_disassembler into two
sub-classes, gdb_non_printing_memory_disassembler and
gdb_non_printing_buffer_disassembler.

The new selftests now pass, but otherwise, there should be no user
visible changes after this commit.
---
 gdb/arc-linux-tdep.c   | 15 +++----
 gdb/arc-tdep.c         | 29 +++-----------
 gdb/arc-tdep.h         |  5 ---
 gdb/disasm-selftests.c | 70 ++++++++++++++++++++++++++-------
 gdb/disasm.c           | 88 ++++++++++++++++++------------------------
 gdb/disasm.h           | 56 ++++++++++++++++++++++++---
 gdb/s12z-tdep.c        | 26 +------------
 7 files changed, 158 insertions(+), 131 deletions(-)

diff --git a/gdb/arc-linux-tdep.c b/gdb/arc-linux-tdep.c
index 13595f2e8e9..04ca38f1355 100644
--- a/gdb/arc-linux-tdep.c
+++ b/gdb/arc-linux-tdep.c
@@ -356,7 +356,7 @@ arc_linux_sw_breakpoint_from_kind (struct gdbarch *gdbarch,
    */
 
 static std::vector<CORE_ADDR>
-handle_atomic_sequence (arc_instruction insn, disassemble_info &di)
+handle_atomic_sequence (arc_instruction insn, disassemble_info *di)
 {
   const int atomic_seq_len = 24;    /* Instruction sequence length.  */
   std::vector<CORE_ADDR> next_pcs;
@@ -374,7 +374,7 @@ handle_atomic_sequence (arc_instruction insn, disassemble_info &di)
   for (int insn_count = 0; insn_count < atomic_seq_len; ++insn_count)
     {
       arc_insn_decode (arc_insn_get_linear_next_pc (insn),
-		       &di, arc_delayed_print_insn, &insn);
+		       di, arc_delayed_print_insn, &insn);
 
       if (insn.insn_class == BRCC)
         {
@@ -412,15 +412,15 @@ arc_linux_software_single_step (struct regcache *regcache)
 {
   struct gdbarch *gdbarch = regcache->arch ();
   arc_gdbarch_tdep *tdep = (arc_gdbarch_tdep *) gdbarch_tdep (gdbarch);
-  struct disassemble_info di = arc_disassemble_info (gdbarch);
+  struct gdb_non_printing_memory_disassembler dis (gdbarch);
 
   /* Read current instruction.  */
   struct arc_instruction curr_insn;
-  arc_insn_decode (regcache_read_pc (regcache), &di, arc_delayed_print_insn,
-		   &curr_insn);
+  arc_insn_decode (regcache_read_pc (regcache), dis.disasm_info (),
+		   arc_delayed_print_insn, &curr_insn);
 
   if (curr_insn.insn_class == LLOCK)
-    return handle_atomic_sequence (curr_insn, di);
+    return handle_atomic_sequence (curr_insn, dis.disasm_info ());
 
   CORE_ADDR next_pc = arc_insn_get_linear_next_pc (curr_insn);
   std::vector<CORE_ADDR> next_pcs;
@@ -431,7 +431,8 @@ arc_linux_software_single_step (struct regcache *regcache)
   if (curr_insn.has_delay_slot)
     {
       struct arc_instruction next_insn;
-      arc_insn_decode (next_pc, &di, arc_delayed_print_insn, &next_insn);
+      arc_insn_decode (next_pc, dis.disasm_info (), arc_delayed_print_insn,
+		       &next_insn);
       next_pcs.push_back (arc_insn_get_linear_next_pc (next_insn));
     }
   else
diff --git a/gdb/arc-tdep.c b/gdb/arc-tdep.c
index 98bd1c4bc0a..75fd3077ca7 100644
--- a/gdb/arc-tdep.c
+++ b/gdb/arc-tdep.c
@@ -1306,24 +1306,6 @@ arc_is_in_prologue (struct gdbarch *gdbarch, const struct arc_instruction &insn,
   return false;
 }
 
-/* See arc-tdep.h.  */
-
-struct disassemble_info
-arc_disassemble_info (struct gdbarch *gdbarch)
-{
-  struct disassemble_info di;
-  init_disassemble_info_for_no_printing (&di);
-  di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
-  di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
-  di.endian = gdbarch_byte_order (gdbarch);
-  di.read_memory_func = [](bfd_vma memaddr, gdb_byte *myaddr,
-			   unsigned int len, struct disassemble_info *info)
-    {
-      return target_read_code (memaddr, myaddr, len);
-    };
-  return di;
-}
-
 /* Analyze the prologue and update the corresponding frame cache for the frame
    unwinder for unwinding frames that doesn't have debug info.  In such
    situation GDB attempts to parse instructions in the prologue to understand
@@ -1394,9 +1376,10 @@ arc_analyze_prologue (struct gdbarch *gdbarch, const CORE_ADDR entrypoint,
   while (current_prologue_end < limit_pc)
     {
       struct arc_instruction insn;
-      struct disassemble_info di = arc_disassemble_info (gdbarch);
-      arc_insn_decode (current_prologue_end, &di, arc_delayed_print_insn,
-		       &insn);
+
+      struct gdb_non_printing_memory_disassembler dis (gdbarch);
+      arc_insn_decode (current_prologue_end, dis.disasm_info (),
+		       arc_delayed_print_insn, &insn);
 
       if (arc_debug)
 	arc_insn_dump (insn);
@@ -2460,8 +2443,8 @@ dump_arc_instruction_command (const char *args, int from_tty)
 
   CORE_ADDR address = value_as_address (val);
   struct arc_instruction insn;
-  struct disassemble_info di = arc_disassemble_info (target_gdbarch ());
-  arc_insn_decode (address, &di, arc_delayed_print_insn, &insn);
+  struct gdb_non_printing_memory_disassembler dis (target_gdbarch ());
+  arc_insn_decode (address, dis.disasm_info (), arc_delayed_print_insn, &insn);
   arc_insn_dump (insn);
 }
 
diff --git a/gdb/arc-tdep.h b/gdb/arc-tdep.h
index ceca003204f..53e5d8476fc 100644
--- a/gdb/arc-tdep.h
+++ b/gdb/arc-tdep.h
@@ -186,11 +186,6 @@ arc_arch_is_em (const struct bfd_arch_info* arch)
    can't be set to an actual NULL value - that would cause a crash.  */
 int arc_delayed_print_insn (bfd_vma addr, struct disassemble_info *info);
 
-/* Return properly initialized disassemble_info for ARC disassembler - it will
-   not print disassembled instructions to stderr.  */
-
-struct disassemble_info arc_disassemble_info (struct gdbarch *gdbarch);
-
 /* Get branch/jump target address for the INSN.  Note that this function
    returns branch target and doesn't evaluate if this branch is taken or not.
    For the indirect jumps value depends in register state, hence can change.
diff --git a/gdb/disasm-selftests.c b/gdb/disasm-selftests.c
index 928d26f7018..07586f04abd 100644
--- a/gdb/disasm-selftests.c
+++ b/gdb/disasm-selftests.c
@@ -25,13 +25,19 @@
 
 namespace selftests {
 
-/* Test disassembly of one instruction.  */
+/* Return a pointer to a buffer containing an instruction that can be
+   disassembled for architecture GDBARCH.  *LEN will be set to the length
+   of the returned buffer.
 
-static void
-print_one_insn_test (struct gdbarch *gdbarch)
+   If there's no known instruction to disassemble for GDBARCH (because we
+   haven't figured on out, not because no instructions exist) then nullptr
+   is returned, and *LEN is set to 0.  */
+
+static const gdb_byte *
+get_test_insn (struct gdbarch *gdbarch, size_t *len)
 {
-  size_t len = 0;
-  const gdb_byte *insn = NULL;
+  *len = 0;
+  const gdb_byte *insn = nullptr;
 
   switch (gdbarch_bfd_arch_info (gdbarch)->arch)
     {
@@ -40,34 +46,34 @@ print_one_insn_test (struct gdbarch *gdbarch)
       static const gdb_byte bfin_insn[] = {0x17, 0xe1, 0xff, 0xff};
 
       insn = bfin_insn;
-      len = sizeof (bfin_insn);
+      *len = sizeof (bfin_insn);
       break;
     case bfd_arch_arm:
       /* mov     r0, #0 */
       static const gdb_byte arm_insn[] = {0x0, 0x0, 0xa0, 0xe3};
 
       insn = arm_insn;
-      len = sizeof (arm_insn);
+      *len = sizeof (arm_insn);
       break;
     case bfd_arch_ia64:
     case bfd_arch_mep:
     case bfd_arch_mips:
     case bfd_arch_tic6x:
     case bfd_arch_xtensa:
-      return;
+      return insn;
     case bfd_arch_s390:
       /* nopr %r7 */
       static const gdb_byte s390_insn[] = {0x07, 0x07};
 
       insn = s390_insn;
-      len = sizeof (s390_insn);
+      *len = sizeof (s390_insn);
       break;
     case bfd_arch_xstormy16:
       /* nop */
       static const gdb_byte xstormy16_insn[] = {0x0, 0x0};
 
       insn = xstormy16_insn;
-      len = sizeof (xstormy16_insn);
+      *len = sizeof (xstormy16_insn);
       break;
     case bfd_arch_nios2:
     case bfd_arch_score:
@@ -78,13 +84,13 @@ print_one_insn_test (struct gdbarch *gdbarch)
       {
 	int bplen;
 	insn = gdbarch_sw_breakpoint_from_kind (gdbarch, 4, &bplen);
-	len = bplen;
+	*len = bplen;
       }
       break;
     case bfd_arch_arc:
       /* PR 21003 */
       if (gdbarch_bfd_arch_info (gdbarch)->mach == bfd_mach_arc_arc601)
-	return;
+	return insn;
       goto generic_case;
     case bfd_arch_i386:
       {
@@ -93,7 +99,7 @@ print_one_insn_test (struct gdbarch *gdbarch)
 	   opcodes rejects an attempt to disassemble for an arch with
 	   a 64-bit address size when bfd_vma is 32-bit.  */
 	if (info->bits_per_address > sizeof (bfd_vma) * CHAR_BIT)
-	  return;
+	  return insn;
       }
       /* fall through */
     default:
@@ -105,12 +111,26 @@ print_one_insn_test (struct gdbarch *gdbarch)
 	int bplen;
 
 	insn = gdbarch_sw_breakpoint_from_kind (gdbarch, kind, &bplen);
-	len = bplen;
+	*len = bplen;
 
 	break;
       }
     }
-  SELF_CHECK (len > 0);
+  SELF_CHECK (*len > 0);
+
+  return insn;
+}
+
+/* Test disassembly of one instruction.  */
+
+static void
+print_one_insn_test (struct gdbarch *gdbarch)
+{
+  size_t len;
+  const gdb_byte *insn = get_test_insn (gdbarch, &len);
+
+  if (insn == nullptr)
+    return;
 
   /* Test gdb_disassembler for a given gdbarch by reading data from a
      pre-allocated buffer.  If you want to see the disassembled
@@ -175,6 +195,24 @@ print_one_insn_test (struct gdbarch *gdbarch)
   SELF_CHECK (di.print_insn (0) == len);
 }
 
+/* Test the gdb_buffered_insn_length function.  */
+
+static void
+buffered_insn_length_test (struct gdbarch *gdbarch)
+{
+  size_t buf_len;
+  const gdb_byte *insn = get_test_insn (gdbarch, &buf_len);
+
+  if (insn == nullptr)
+    return;
+
+  CORE_ADDR insn_address = 0;
+  int calculated_len = gdb_buffered_insn_length (gdbarch, insn, buf_len,
+						 insn_address);
+
+  SELF_CHECK (calculated_len == buf_len);
+}
+
 /* Test disassembly on memory error.  */
 
 static void
@@ -235,4 +273,6 @@ _initialize_disasm_selftests ()
 					 selftests::print_one_insn_test);
   selftests::register_test_foreach_arch ("memory_error",
 					 selftests::memory_error_test);
+  selftests::register_test_foreach_arch ("buffered_insn_length",
+					 selftests::buffered_insn_length_test);
 }
diff --git a/gdb/disasm.c b/gdb/disasm.c
index 4af40c916b2..53cd6f5b6bb 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -1003,66 +1003,56 @@ gdb_insn_length (struct gdbarch *gdbarch, CORE_ADDR addr)
   return gdb_print_insn (gdbarch, addr, &null_stream, NULL);
 }
 
-/* An fprintf-function for use by the disassembler when we know we don't
-   want to print anything.  Always returns success.  */
+/* See disasm.h.  */
 
-static int ATTRIBUTE_PRINTF (2, 3)
-gdb_disasm_null_printf (void *stream, const char *format, ...)
+int
+gdb_non_printing_disassembler::null_fprintf_func (void *stream,
+						  const char *format, ...)
 {
   return 0;
 }
 
-/* An fprintf-function for use by the disassembler when we know we don't
-   want to print anything, and the disassembler is using style.  Always
-   returns success.  */
+/* See disasm.h.  */
 
-static int ATTRIBUTE_PRINTF (3, 4)
-gdb_disasm_null_styled_printf (void *stream,
-			       enum disassembler_style style,
-			       const char *format, ...)
+int
+gdb_non_printing_disassembler::null_fprintf_styled_func
+  (void *stream, enum disassembler_style style, const char *format, ...)
 {
   return 0;
 }
 
 /* See disasm.h.  */
 
-void
-init_disassemble_info_for_no_printing (struct disassemble_info *dinfo)
+int
+gdb_non_printing_memory_disassembler::dis_asm_read_memory
+  (bfd_vma memaddr, bfd_byte *myaddr, unsigned int length,
+   struct disassemble_info *dinfo)
 {
-  init_disassemble_info (dinfo, nullptr, gdb_disasm_null_printf,
-			 gdb_disasm_null_styled_printf);
+  return target_read_code (memaddr, myaddr, length);
 }
 
-/* Initialize a struct disassemble_info for gdb_buffered_insn_length.
-   Upon return, *DISASSEMBLER_OPTIONS_HOLDER owns the string pointed
-   to by DI.DISASSEMBLER_OPTIONS.  */
+/* A non-printing disassemble_info management class.  The disassemble_info
+   setup by this class will not print anything to the output stream (there
+   is no output stream), and the instruction to be disassembled will be
+   read from a buffer passed to the constructor.  */
 
-static void
-gdb_buffered_insn_length_init_dis (struct gdbarch *gdbarch,
-				   struct disassemble_info *di,
-				   const gdb_byte *insn, int max_len,
-				   CORE_ADDR addr,
-				   std::string *disassembler_options_holder)
+struct gdb_non_printing_buffer_disassembler
+  : public gdb_non_printing_disassembler
 {
-  init_disassemble_info_for_no_printing (di);
-
-  /* init_disassemble_info installs buffer_read_memory, etc.
-     so we don't need to do that here.
-     The cast is necessary until disassemble_info is const-ified.  */
-  di->buffer = (gdb_byte *) insn;
-  di->buffer_length = max_len;
-  di->buffer_vma = addr;
-
-  di->arch = gdbarch_bfd_arch_info (gdbarch)->arch;
-  di->mach = gdbarch_bfd_arch_info (gdbarch)->mach;
-  di->endian = gdbarch_byte_order (gdbarch);
-  di->endian_code = gdbarch_byte_order_for_code (gdbarch);
-
-  *disassembler_options_holder = get_all_disassembler_options (gdbarch);
-  if (!disassembler_options_holder->empty ())
-    di->disassembler_options = disassembler_options_holder->c_str ();
-  disassemble_init_for_target (di);
-}
+  /* Constructor.  GDBARCH is the architecture to disassemble for, BUFFER
+     contains the instruction to disassemble, and INSN_ADDRESS is the
+     address (in target memory) of the instruction to disassemble.  */
+  gdb_non_printing_buffer_disassembler (struct gdbarch *gdbarch,
+					gdb::array_view<const gdb_byte> buffer,
+					CORE_ADDR insn_address)
+    : gdb_non_printing_disassembler (gdbarch, nullptr)
+  {
+    /* The cast is necessary until disassemble_info is const-ified.  */
+    m_di.buffer = (gdb_byte *) buffer.data ();
+    m_di.buffer_length = buffer.size ();
+    m_di.buffer_vma = insn_address;
+  }
+};
 
 /* Return the length in bytes of INSN.  MAX_LEN is the size of the
    buffer containing INSN.  */
@@ -1071,14 +1061,10 @@ int
 gdb_buffered_insn_length (struct gdbarch *gdbarch,
 			  const gdb_byte *insn, int max_len, CORE_ADDR addr)
 {
-  struct disassemble_info di;
-  std::string disassembler_options_holder;
-
-  gdb_buffered_insn_length_init_dis (gdbarch, &di, insn, max_len, addr,
-				     &disassembler_options_holder);
-
-  int result = gdb_print_insn_1 (gdbarch, addr, &di);
-  disassemble_free_target (&di);
+  gdb::array_view<const gdb_byte> buffer
+    = gdb::make_array_view (insn, max_len);
+  gdb_non_printing_buffer_disassembler dis (gdbarch, buffer, addr);
+  int result = gdb_print_insn_1 (gdbarch, addr, dis.disasm_info ());
   return result;
 }
 
diff --git a/gdb/disasm.h b/gdb/disasm.h
index f31ca92b038..ec5120351a1 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -136,6 +136,56 @@ struct gdb_printing_disassembler : public gdb_disassemble_info
     ATTRIBUTE_PRINTF(3,4);
 };
 
+/* A basic disassembler that doesn't actually print anything.  */
+
+struct gdb_non_printing_disassembler : public gdb_disassemble_info
+{
+  gdb_non_printing_disassembler (struct gdbarch *gdbarch,
+				 read_memory_ftype read_memory_func)
+    : gdb_disassemble_info (gdbarch, nullptr /* stream */,
+			    read_memory_func,
+			    nullptr /* memory_error_func */,
+			    nullptr /* print_address_func */,
+			    null_fprintf_func,
+			    null_fprintf_styled_func)
+  { /* Nothing.  */ }
+
+private:
+
+  /* Callback used as the disassemble_info's fprintf_func callback, this
+     doesn't write anything to STREAM, but just returns 0.  */
+  static int null_fprintf_func (void *stream, const char *format, ...)
+    ATTRIBUTE_PRINTF(2,3);
+
+  /* Callback used as the disassemble_info's fprintf_styled_func callback,
+     , this doesn't write anything to STREAM, but just returns 0.  */
+  static int null_fprintf_styled_func (void *stream,
+				       enum disassembler_style style,
+				       const char *format, ...)
+    ATTRIBUTE_PRINTF(3,4);
+};
+
+/* A non-printing disassemble_info management class.  The disassemble_info
+   setup by this class will not print anything to the output stream (there
+   is no output stream), and the instruction to be disassembled will be
+   read from target memory.  */
+
+struct gdb_non_printing_memory_disassembler
+  : public gdb_non_printing_disassembler
+{
+  /* Constructor.  GDBARCH is the architecture to disassemble for.  */
+  gdb_non_printing_memory_disassembler (struct gdbarch *gdbarch)
+    :gdb_non_printing_disassembler (gdbarch, dis_asm_read_memory)
+  { /* Nothing.  */ }
+
+private:
+
+  /* Implements the read_memory_func disassemble_info callback.  */
+  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
+				  unsigned int len,
+				  struct disassemble_info *info);
+};
+
 /* A dissassembler class that provides 'print_insn', a method for
    disassembling a single instruction to the output stream.  */
 
@@ -278,10 +328,4 @@ extern char *get_disassembler_options (struct gdbarch *gdbarch);
 
 extern void set_disassembler_options (const char *options);
 
-/* Setup DINFO with its output function and output stream setup so that
-   nothing is printed while disassembling.  */
-
-extern void init_disassemble_info_for_no_printing
-  (struct disassemble_info *dinfo);
-
 #endif
diff --git a/gdb/s12z-tdep.c b/gdb/s12z-tdep.c
index 5394c1bbf5e..4e33faaea9a 100644
--- a/gdb/s12z-tdep.c
+++ b/gdb/s12z-tdep.c
@@ -141,27 +141,6 @@ s12z_dwarf_reg_to_regnum (struct gdbarch *gdbarch, int num)
 
 /* Support functions for frame handling.  */
 
-
-/* Return a disassemble_info initialized for s12z disassembly, however,
-   the disassembler will not actually print anything.  */
-
-static struct disassemble_info
-s12z_disassemble_info (struct gdbarch *gdbarch)
-{
-  struct disassemble_info di;
-  init_disassemble_info_for_no_printing (&di);
-  di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
-  di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
-  di.endian = gdbarch_byte_order (gdbarch);
-  di.read_memory_func = [](bfd_vma memaddr, gdb_byte *myaddr,
-			   unsigned int len, struct disassemble_info *info)
-    {
-      return target_read_code (memaddr, myaddr, len);
-    };
-  return di;
-}
-
-
 /* A struct (based on mem_read_abstraction_base) to read memory
    through the disassemble_info API.  */
 struct mem_read_abstraction
@@ -332,15 +311,14 @@ s12z_frame_cache (struct frame_info *this_frame, void **prologue_cache)
   int frame_size = 0;
   int saved_frame_size = 0;
 
-  struct disassemble_info di = s12z_disassemble_info (gdbarch);
-
+  struct gdb_non_printing_memory_disassembler dis (gdbarch);
 
   struct mem_read_abstraction mra;
   mra.base.read = (int (*)(mem_read_abstraction_base*,
 			   int, size_t, bfd_byte*)) abstract_read_memory;
   mra.base.advance = advance ;
   mra.base.posn = posn;
-  mra.info = &di;
+  mra.info = dis.disasm_info ();
 
   while (this_pc > addr)
     {
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PATCHv6 6/6] gdb: unify two dis_asm_read_memory functions in disasm.c
  2022-05-25 10:49         ` [PATCHv6 0/6] Add Python API for the disassembler Andrew Burgess
                             ` (4 preceding siblings ...)
  2022-05-25 10:49           ` [PATCHv6 5/6] gdb: refactor the non-printing disassemblers Andrew Burgess
@ 2022-05-25 10:49           ` Andrew Burgess
  2022-06-15  9:04           ` [PUSHED 0/6] Add Python API for the disassembler Andrew Burgess
  6 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-05-25 10:49 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

After the recent restructuring of the disassembler code, GDB has ended
up with two identical class static functions, both called
dis_asm_read_memory, with identical implementations.

My first thought was to move these out of their respective classes,
and just make them global functions, then I'd only need a single
copy.

And maybe that's the right way to go.  But I disliked that by doing
that I loose the encapsulation of the method with the corresponding
disassembler class.

So, instead, I placed the static method into its own class, and had
both the gdb_non_printing_memory_disassembler and gdb_disassembler
classes inherit from this new class as an additional base-class.

In terms of code generated, I don't think there's any significant
difference with this approach, but I think this better reflects how
the function is closely tied to the disassembler.

There should be no user visible changes after this commit.
---
 gdb/disasm.c | 16 +++-------------
 gdb/disasm.h | 29 +++++++++++++++++------------
 2 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/gdb/disasm.c b/gdb/disasm.c
index 53cd6f5b6bb..c6edc92930d 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -132,9 +132,9 @@ line_has_code_p (htab_t table, struct symtab *symtab, int line)
 /* Wrapper of target_read_code.  */
 
 int
-gdb_disassembler::dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
-				       unsigned int len,
-				       struct disassemble_info *info)
+gdb_disassembler_memory_reader::dis_asm_read_memory
+  (bfd_vma memaddr, gdb_byte *myaddr, unsigned int len,
+   struct disassemble_info *info)
 {
   return target_read_code (memaddr, myaddr, len);
 }
@@ -1021,16 +1021,6 @@ gdb_non_printing_disassembler::null_fprintf_styled_func
   return 0;
 }
 
-/* See disasm.h.  */
-
-int
-gdb_non_printing_memory_disassembler::dis_asm_read_memory
-  (bfd_vma memaddr, bfd_byte *myaddr, unsigned int length,
-   struct disassemble_info *dinfo)
-{
-  return target_read_code (memaddr, myaddr, length);
-}
-
 /* A non-printing disassemble_info management class.  The disassemble_info
    setup by this class will not print anything to the output stream (there
    is no output stream), and the instruction to be disassembled will be
diff --git a/gdb/disasm.h b/gdb/disasm.h
index ec5120351a1..da03e130526 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -165,31 +165,39 @@ struct gdb_non_printing_disassembler : public gdb_disassemble_info
     ATTRIBUTE_PRINTF(3,4);
 };
 
+/* This is a helper class, for use as an additional base-class, by some of
+   the disassembler classes below.  This class just defines a static method
+   for reading from target memory, which can then be used by the various
+   disassembler sub-classes.  */
+
+struct gdb_disassembler_memory_reader
+{
+  /* Implements the read_memory_func disassemble_info callback.  */
+  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
+				  unsigned int len,
+				  struct disassemble_info *info);
+};
+
 /* A non-printing disassemble_info management class.  The disassemble_info
    setup by this class will not print anything to the output stream (there
    is no output stream), and the instruction to be disassembled will be
    read from target memory.  */
 
 struct gdb_non_printing_memory_disassembler
-  : public gdb_non_printing_disassembler
+  : public gdb_non_printing_disassembler,
+    private gdb_disassembler_memory_reader
 {
   /* Constructor.  GDBARCH is the architecture to disassemble for.  */
   gdb_non_printing_memory_disassembler (struct gdbarch *gdbarch)
     :gdb_non_printing_disassembler (gdbarch, dis_asm_read_memory)
   { /* Nothing.  */ }
-
-private:
-
-  /* Implements the read_memory_func disassemble_info callback.  */
-  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
-				  unsigned int len,
-				  struct disassemble_info *info);
 };
 
 /* A dissassembler class that provides 'print_insn', a method for
    disassembling a single instruction to the output stream.  */
 
-struct gdb_disassembler : public gdb_printing_disassembler
+struct gdb_disassembler : public gdb_printing_disassembler,
+			  private gdb_disassembler_memory_reader
 {
   gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
     : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
@@ -239,9 +247,6 @@ struct gdb_disassembler : public gdb_printing_disassembler
      (currently just to addresses and symbols) as it goes.  */
   static bool use_ext_lang_colorization_p;
 
-  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
-				  unsigned int len,
-				  struct disassemble_info *info);
   static void dis_asm_memory_error (int err, bfd_vma memaddr,
 				    struct disassemble_info *info);
   static void dis_asm_print_address (bfd_vma addr,
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* Re: [PATCHv6 4/6] gdb/python: implement the print_insn extension language hook
  2022-05-25 10:49           ` [PATCHv6 4/6] gdb/python: implement the print_insn extension language hook Andrew Burgess
@ 2022-05-25 13:32             ` Eli Zaretskii
  0 siblings, 0 replies; 80+ messages in thread
From: Eli Zaretskii @ 2022-05-25 13:32 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: gdb-patches, andrew.burgess

> Date: Wed, 25 May 2022 11:49:53 +0100
> From: Andrew Burgess via Gdb-patches <gdb-patches@sourceware.org>
> Cc: Andrew Burgess <andrew.burgess@embecosm.com>
> 
>  gdb/Makefile.in                        |    1 +
>  gdb/NEWS                               |   34 +
>  gdb/data-directory/Makefile.in         |    1 +
>  gdb/doc/gdb.texinfo                    |   45 +
>  gdb/doc/python.texi                    |  328 +++++++
>  gdb/python/lib/gdb/disassembler.py     |  178 ++++
>  gdb/python/py-disasm.c                 | 1090 ++++++++++++++++++++++++
>  gdb/python/python-internal.h           |   23 +
>  gdb/python/python.c                    |    3 +-
>  gdb/testsuite/gdb.python/py-disasm.c   |   25 +
>  gdb/testsuite/gdb.python/py-disasm.exp |  209 +++++
>  gdb/testsuite/gdb.python/py-disasm.py  |  712 ++++++++++++++++
>  12 files changed, 2648 insertions(+), 1 deletion(-)
>  create mode 100644 gdb/python/lib/gdb/disassembler.py
>  create mode 100644 gdb/python/py-disasm.c
>  create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
>  create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
>  create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

Thanks, the documentation parts are OK.  (I reviewed only the parts
that you said where modified since v5.)

^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PUSHED 0/6] Add Python API for the disassembler
  2022-05-25 10:49         ` [PATCHv6 0/6] Add Python API for the disassembler Andrew Burgess
                             ` (5 preceding siblings ...)
  2022-05-25 10:49           ` [PATCHv6 6/6] gdb: unify two dis_asm_read_memory functions in disasm.c Andrew Burgess
@ 2022-06-15  9:04           ` Andrew Burgess
  2022-06-15  9:04             ` [PUSHED 1/6] gdb/python: convert gdbpy_err_fetch to use gdbpy_ref Andrew Burgess
                               ` (5 more replies)
  6 siblings, 6 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-06-15  9:04 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

I've now pushed this series.  After rebasing there was a minor change
needed in patch #5 to take account of recent changes to
disasm-selftest.c.  All other patches were unchanged.

Changes in v6:

  See: https://sourceware.org/pipermail/gdb-patches/2022-May/189391.html

  - Patch #1 is new, this is required to support changes I made in
    patch #4.

  - Patches #2, #3, #5, and #6 are unchanged since v5.

  - Patch #4 has changed:

    + In gdbpy_print_insn, we now report an error (-1) back to core
      GDB in more cases,

    + In disasmpy_builtin_disassemble we now catch gdbpy_err_fetch
      objects thrown as exceptions and restore them, this allows
      Python exceptions to propagate from
      gdbpy_disassembler::read_memory_func back to the users Python
      code.  Additionally, we now raise a gdb.GdbError in this
      function if the builtin disassembler has not registered a memory
      error.

    + In gdbpy_disassembler::read_memory_func, we capture any
      exception that is not a gdb.MemoryError and throw it using a
      gdbpy_err_fetch object, this will be caught in
      disasmpy_builtin_disassemble and restored.

    + Tests have been updated and expanded to take account of the new
      exception handling behaviour.  Only tests that exercised the
      exception handling code needed to change, which I was pleased
      about.

    + DOCS - There have been changes to the docs:

      * The DisassembleInfo description has been reordered, and the
        description of DisassembleInfo.read_memory has been mostly
        rewritten.

      * The description of Disassembler.__call__ has been mostly
        rewritten.

      * The description of builtin_disassemble has been mostly
        rewritten.

Changes in v5:

  - Patch #1, minor typo fixes, and reword some comments in line with
    Simon's feedback.  Have not restructured the class hierarchy, this
    was mentioned in Simon's feedback, but he also said he'd accept
    what I have right now.  I think what I have right now does have
    some benefits, so I've stuck with that for now.

  - Patch #2, minor typo fixes based on Simon's feedback.

  - Patch #3, lots of significant changes.

    + Documentation has been updated and expanded significantly,

    + Added a new 'maint info python-disassemblers' command,

    + Removed the memory_source argument to the builtin_disassembler
      function, DisassembleInfo objects can now be sub-classed to
      achieve the same result,

    + Added additional test to catch more of the error cases, and
      updated the tests that related to the memory_source usage that
      has now been removed.

    + Plus all the minor style issues and typos that Simon pointed
      out.

Changes in v4:

  - Patch #1 from v3 series has been merged,

  - Addressed Eli's feedback on previous series,

  - Rebased onto current upstream/master.

Changes in v3:

  - Rebased to current master, and retested,

  - Patch #1 is new in this series,

  - Patch #2 is changed slightly from v2, I've reworked the
    disassembler classes in a slightly different way now, in order to
    prepare for patches #5 and #6.

  - Patch #3 is unchanged from v2,

  - Patch #4 is unchanged from v2,

  - Patch #5 is new in v3.  I've included it here as the changes in #2
    only make sense knowing that patch #5 is coming,

  - Patch #6 is a small cleanup only possible after #2 and #5 have landed.

Changes in v2:

  - The first 3 patches from the v1 series were merged a while back,
    these were all refactoring, or auxiliary features,

  - There's a new #1 patch in the v2 series that does some new
    refactoring of GDB's disassembler classes, this was required in
    order to simplify the #3 patch,

  - Patch #2 in the v2 series is largely unchanged from patch #4 in
    the v1 series,

  - The syntax highlighting work that was in the v1 series was spun
    out into its own patch, and has been merged separately,

  - The format_address helper function that appeared in the v1 series,
    and that Simon suggested I make more general, was spun out into
    its own patch, and merged separately,

  - Finally, patch #3 in the v2 series is pretty much a complete
    rewrite from the v1 series in order to follow the approach
    suggested by Simon.  Results are now returned directly, either via
    'return' or by raising an exception, in contrast to the original
    approach which involved "setting" the result into an existing
    state object.

---

Andrew Burgess (6):
  gdb/python: convert gdbpy_err_fetch to use gdbpy_ref
  gdb: add new base class to gdb_disassembler
  gdb: add extension language print_insn hook
  gdb/python: implement the print_insn extension language hook
  gdb: refactor the non-printing disassemblers
  gdb: unify two dis_asm_read_memory functions in disasm.c

 gdb/Makefile.in                        |    1 +
 gdb/NEWS                               |   34 +
 gdb/arc-linux-tdep.c                   |   15 +-
 gdb/arc-tdep.c                         |   29 +-
 gdb/arc-tdep.h                         |    5 -
 gdb/arm-tdep.c                         |    4 +-
 gdb/data-directory/Makefile.in         |    1 +
 gdb/disasm-selftests.c                 |   86 +-
 gdb/disasm.c                           |  179 ++--
 gdb/disasm.h                           |  207 ++++-
 gdb/doc/gdb.texinfo                    |   45 +
 gdb/doc/python.texi                    |  328 +++++++
 gdb/extension-priv.h                   |   15 +
 gdb/extension.c                        |   20 +
 gdb/extension.h                        |   10 +
 gdb/guile/guile.c                      |    6 +-
 gdb/mips-tdep.c                        |    4 +-
 gdb/python/lib/gdb/disassembler.py     |  178 ++++
 gdb/python/py-disasm.c                 | 1090 ++++++++++++++++++++++++
 gdb/python/py-utils.c                  |    8 +-
 gdb/python/python-internal.h           |   46 +-
 gdb/python/python.c                    |    3 +
 gdb/s12z-tdep.c                        |   26 +-
 gdb/testsuite/gdb.python/py-disasm.c   |   25 +
 gdb/testsuite/gdb.python/py-disasm.exp |  209 +++++
 gdb/testsuite/gdb.python/py-disasm.py  |  712 ++++++++++++++++
 26 files changed, 3068 insertions(+), 218 deletions(-)
 create mode 100644 gdb/python/lib/gdb/disassembler.py
 create mode 100644 gdb/python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PUSHED 1/6] gdb/python: convert gdbpy_err_fetch to use gdbpy_ref
  2022-06-15  9:04           ` [PUSHED 0/6] Add Python API for the disassembler Andrew Burgess
@ 2022-06-15  9:04             ` Andrew Burgess
  2022-06-15  9:04             ` [PUSHED 2/6] gdb: add new base class to gdb_disassembler Andrew Burgess
                               ` (4 subsequent siblings)
  5 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-06-15  9:04 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

Convert the gdbpy_err_fetch class to make use of gdbpy_ref, this
removes the need for manual reference count management, and allows the
destructor to be removed.

There should be no functional change after this commit.

I think this cleanup is worth doing on its own, however, in a later
commit I will want to copy instances of gdbpy_err_fetch, and switching
to using gdbpy_ref means that I can rely on the default copy
constructor, without having to add one that handles the reference
counts, so this is good preparation for that upcoming change.
---
 gdb/python/py-utils.c        |  8 ++++----
 gdb/python/python-internal.h | 23 ++++++++++-------------
 2 files changed, 14 insertions(+), 17 deletions(-)

diff --git a/gdb/python/py-utils.c b/gdb/python/py-utils.c
index 1bd7b477ecb..58c18c60e2c 100644
--- a/gdb/python/py-utils.c
+++ b/gdb/python/py-utils.c
@@ -194,10 +194,10 @@ gdbpy_err_fetch::to_string () const
      Using str (aka PyObject_Str) will fetch the error message from
      gdb.GdbError ("message").  */
 
-  if (m_error_value && m_error_value != Py_None)
-    return gdbpy_obj_to_string (m_error_value);
+  if (m_error_value.get () != nullptr && m_error_value.get () != Py_None)
+    return gdbpy_obj_to_string (m_error_value.get ());
   else
-    return gdbpy_obj_to_string (m_error_type);
+    return gdbpy_obj_to_string (m_error_type.get ());
 }
 
 /* See python-internal.h.  */
@@ -205,7 +205,7 @@ gdbpy_err_fetch::to_string () const
 gdb::unique_xmalloc_ptr<char>
 gdbpy_err_fetch::type_to_string () const
 {
-  return gdbpy_obj_to_string (m_error_type);
+  return gdbpy_obj_to_string (m_error_type.get ());
 }
 
 /* Convert a GDB exception to the appropriate Python exception.
diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
index 217bc15bb28..da2e79101a6 100644
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@@ -549,14 +549,12 @@ class gdbpy_err_fetch
 
   gdbpy_err_fetch ()
   {
-    PyErr_Fetch (&m_error_type, &m_error_value, &m_error_traceback);
-  }
+    PyObject *error_type, *error_value, *error_traceback;
 
-  ~gdbpy_err_fetch ()
-  {
-    Py_XDECREF (m_error_type);
-    Py_XDECREF (m_error_value);
-    Py_XDECREF (m_error_traceback);
+    PyErr_Fetch (&error_type, &error_value, &error_traceback);
+    m_error_type.reset (error_type);
+    m_error_value.reset (error_value);
+    m_error_traceback.reset (error_traceback);
   }
 
   /* Call PyErr_Restore using the values stashed in this object.
@@ -565,10 +563,9 @@ class gdbpy_err_fetch
 
   void restore ()
   {
-    PyErr_Restore (m_error_type, m_error_value, m_error_traceback);
-    m_error_type = nullptr;
-    m_error_value = nullptr;
-    m_error_traceback = nullptr;
+    PyErr_Restore (m_error_type.release (),
+		   m_error_value.release (),
+		   m_error_traceback.release ());
   }
 
   /* Return the string representation of the exception represented by
@@ -587,12 +584,12 @@ class gdbpy_err_fetch
 
   bool type_matches (PyObject *type) const
   {
-    return PyErr_GivenExceptionMatches (m_error_type, type);
+    return PyErr_GivenExceptionMatches (m_error_type.get (), type);
   }
 
 private:
 
-  PyObject *m_error_type, *m_error_value, *m_error_traceback;
+  gdbpy_ref<> m_error_type, m_error_value, m_error_traceback;
 };
 
 /* Called before entering the Python interpreter to install the
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PUSHED 2/6] gdb: add new base class to gdb_disassembler
  2022-06-15  9:04           ` [PUSHED 0/6] Add Python API for the disassembler Andrew Burgess
  2022-06-15  9:04             ` [PUSHED 1/6] gdb/python: convert gdbpy_err_fetch to use gdbpy_ref Andrew Burgess
@ 2022-06-15  9:04             ` Andrew Burgess
  2022-06-15  9:04             ` [PUSHED 3/6] gdb: add extension language print_insn hook Andrew Burgess
                               ` (3 subsequent siblings)
  5 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-06-15  9:04 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

The motivation for this change is an upcoming Python disassembler API
that I would like to add.  As part of that change I need to create a
new disassembler like class that contains a disassemble_info and a
gdbarch.  The management of these two objects is identical to how we
manage these objects within gdb_disassembler, so it might be tempting
for my new class to inherit from gdb_disassembler.

The problem however, is that gdb_disassembler has a tight connection
between its constructor, and its print_insn method.  In the
constructor the ui_file* that is passed in is replaced with a member
variable string_file*, and then in print_insn, the contents of the
member variable string_file are printed to the original ui_file*.

What this means is that the gdb_disassembler class has a tight
coupling between its constructor and print_insn; the class just isn't
intended to be used in a situation where print_insn is not going to be
called, which is how my (upcoming) sub-class would need to operate.

My solution then, is to separate out the management of the
disassemble_info and gdbarch into a new gdb_disassemble_info class,
and make this class a parent of gdb_disassembler.

In arm-tdep.c and mips-tdep.c, where we used to cast the
disassemble_info->application_data to a gdb_disassembler, we can now
cast to a gdb_disassemble_info as we only need to access the gdbarch
information.

Now, my new Python disassembler sub-class will still want to print
things to an output stream, and so we will want access to the
dis_asm_fprintf functionality for printing.

However, rather than move this printing code into the
gdb_disassemble_info base class, I have added yet another level of
hierarchy, a gdb_printing_disassembler, thus the class structure is
now:

  struct gdb_disassemble_info {};
  struct gdb_printing_disassembler : public gdb_disassemble_info {};
  struct gdb_disassembler : public gdb_printing_disassembler {};

In a later commit my new Python disassembler will inherit from
gdb_printing_disassembler.

The reason for adding the additional layer to the class hierarchy is
that in yet another commit I intend to rewrite the function
gdb_buffered_insn_length, and to do this I will be creating yet more
disassembler like classes, however, these will not print anything,
thus I will add a gdb_non_printing_disassembler class that also
inherits from gdb_disassemble_info.  Knowing that that change is
coming, I've gone with the above class hierarchy now.

There should be no user visible changes after this commit.
---
 gdb/arm-tdep.c  |   4 +-
 gdb/disasm.c    |  58 +++++++++++++-------
 gdb/disasm.h    | 140 ++++++++++++++++++++++++++++++++++++++----------
 gdb/mips-tdep.c |   4 +-
 4 files changed, 154 insertions(+), 52 deletions(-)

diff --git a/gdb/arm-tdep.c b/gdb/arm-tdep.c
index 456649afdaa..fe62617d4bf 100644
--- a/gdb/arm-tdep.c
+++ b/gdb/arm-tdep.c
@@ -8290,8 +8290,8 @@ arm_displaced_step_fixup (struct gdbarch *gdbarch,
 static int
 gdb_print_insn_arm (bfd_vma memaddr, disassemble_info *info)
 {
-  gdb_disassembler *di
-    = static_cast<gdb_disassembler *>(info->application_data);
+  gdb_disassemble_info *di
+    = static_cast<gdb_disassemble_info *> (info->application_data);
   struct gdbarch *gdbarch = di->arch ();
 
   if (arm_pc_is_thumb (gdbarch, memaddr))
diff --git a/gdb/disasm.c b/gdb/disasm.c
index f2df5ef7bc5..6ac84388cc3 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -166,7 +166,8 @@ gdb_disassembler::dis_asm_print_address (bfd_vma addr,
 /* Format disassembler output to STREAM.  */
 
 int
-gdb_disassembler::dis_asm_fprintf (void *stream, const char *format, ...)
+gdb_printing_disassembler::fprintf_func (void *stream,
+					 const char *format, ...)
 {
   va_list args;
 
@@ -180,9 +181,9 @@ gdb_disassembler::dis_asm_fprintf (void *stream, const char *format, ...)
 /* See disasm.h.  */
 
 int
-gdb_disassembler::dis_asm_styled_fprintf (void *stream,
-					  enum disassembler_style style,
-					  const char *format, ...)
+gdb_printing_disassembler::fprintf_styled_func (void *stream,
+						enum disassembler_style style,
+						const char *format, ...)
 {
   va_list args;
 
@@ -797,26 +798,41 @@ get_all_disassembler_options (struct gdbarch *gdbarch)
 
 gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
 				    struct ui_file *file,
-				    di_read_memory_ftype read_memory_func)
-  : m_gdbarch (gdbarch),
+				    read_memory_ftype func)
+  : gdb_printing_disassembler (gdbarch, &m_buffer, func,
+			       dis_asm_memory_error, dis_asm_print_address),
     m_buffer (!use_ext_lang_colorization_p && disassembler_styling
 	      && file->can_emit_style_escape ()),
     m_dest (file)
+{ /* Nothing.  */ }
+
+/* See disasm.h.  */
+
+gdb_disassemble_info::gdb_disassemble_info
+  (struct gdbarch *gdbarch, struct ui_file *stream,
+   read_memory_ftype read_memory_func, memory_error_ftype memory_error_func,
+   print_address_ftype print_address_func, fprintf_ftype fprintf_func,
+   fprintf_styled_ftype fprintf_styled_func)
+    : m_gdbarch (gdbarch)
 {
-  init_disassemble_info (&m_di, &m_buffer, dis_asm_fprintf,
-			 dis_asm_styled_fprintf);
+  gdb_assert (fprintf_func != nullptr);
+  gdb_assert (fprintf_styled_func != nullptr);
+  init_disassemble_info (&m_di, stream, fprintf_func,
+			 fprintf_styled_func);
   m_di.flavour = bfd_target_unknown_flavour;
-  m_di.memory_error_func = dis_asm_memory_error;
-  m_di.print_address_func = dis_asm_print_address;
-  /* NOTE: cagney/2003-04-28: The original code, from the old Insight
-     disassembler had a local optimization here.  By default it would
-     access the executable file, instead of the target memory (there
-     was a growing list of exceptions though).  Unfortunately, the
-     heuristic was flawed.  Commands like "disassemble &variable"
-     didn't work as they relied on the access going to the target.
-     Further, it has been superseeded by trust-read-only-sections
-     (although that should be superseeded by target_trust..._p()).  */
-  m_di.read_memory_func = read_memory_func;
+
+  /* The memory_error_func, print_address_func, and read_memory_func are
+     all initialized to a default (non-nullptr) value by the call to
+     init_disassemble_info above.  If the user is overriding these fields
+     (by passing non-nullptr values) then do that now, otherwise, leave
+     these fields as the defaults.  */
+  if (memory_error_func != nullptr)
+    m_di.memory_error_func = memory_error_func;
+  if (print_address_func != nullptr)
+    m_di.print_address_func = print_address_func;
+  if (read_memory_func != nullptr)
+    m_di.read_memory_func = read_memory_func;
+
   m_di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
   m_di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
   m_di.endian = gdbarch_byte_order (gdbarch);
@@ -828,7 +844,9 @@ gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
   disassemble_init_for_target (&m_di);
 }
 
-gdb_disassembler::~gdb_disassembler ()
+/* See disasm.h.  */
+
+gdb_disassemble_info::~gdb_disassemble_info ()
 {
   disassemble_free_target (&m_di);
 }
diff --git a/gdb/disasm.h b/gdb/disasm.h
index 7efab7db46c..f31ca92b038 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -26,43 +26,137 @@ struct gdbarch;
 struct ui_out;
 struct ui_file;
 
-class gdb_disassembler
-{
-  using di_read_memory_ftype = decltype (disassemble_info::read_memory_func);
-
-public:
-  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
-    : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
-  {}
+/* A wrapper around a disassemble_info and a gdbarch.  This is the core
+   set of data that all disassembler sub-classes will need.  This class
+   doesn't actually implement the disassembling process, that is something
+   that sub-classes will do, with each sub-class doing things slightly
+   differently.
 
-  ~gdb_disassembler ();
+   The constructor of this class is protected, you should not create
+   instances of this class directly, instead create an instance of an
+   appropriate sub-class.  */
 
-  DISABLE_COPY_AND_ASSIGN (gdb_disassembler);
-
-  int print_insn (CORE_ADDR memaddr, int *branch_delay_insns = NULL);
+struct gdb_disassemble_info
+{
+  DISABLE_COPY_AND_ASSIGN (gdb_disassemble_info);
 
-  /* Return the gdbarch of gdb_disassembler.  */
+  /* Return the gdbarch we are disassembling for.  */
   struct gdbarch *arch ()
   { return m_gdbarch; }
 
+  /* Return a pointer to the disassemble_info, this will be needed for
+     passing into the libopcodes disassembler.  */
+  struct disassemble_info *disasm_info ()
+  { return &m_di; }
+
 protected:
-  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file,
-		    di_read_memory_ftype func);
 
+  /* Types for the function callbacks within m_di.  */
+  using read_memory_ftype = decltype (disassemble_info::read_memory_func);
+  using memory_error_ftype = decltype (disassemble_info::memory_error_func);
+  using print_address_ftype = decltype (disassemble_info::print_address_func);
+  using fprintf_ftype = decltype (disassemble_info::fprintf_func);
+  using fprintf_styled_ftype = decltype (disassemble_info::fprintf_styled_func);
+
+  /* Constructor, many fields in m_di are initialized from GDBARCH.  STREAM
+     is where the output of the disassembler will be written too, the
+     remaining arguments are function callbacks that are written into
+     m_di.  Of these function callbacks FPRINTF_FUNC and
+     FPRINTF_STYLED_FUNC must not be nullptr.  If READ_MEMORY_FUNC,
+     MEMORY_ERROR_FUNC, or PRINT_ADDRESS_FUNC are nullptr, then that field
+     within m_di is left with its default value (see the libopcodes
+     function init_disassemble_info for the defaults).  */
+  gdb_disassemble_info (struct gdbarch *gdbarch,
+			struct ui_file *stream,
+			read_memory_ftype read_memory_func,
+			memory_error_ftype memory_error_func,
+			print_address_ftype print_address_func,
+			fprintf_ftype fprintf_func,
+			fprintf_styled_ftype fprintf_styled_func);
+
+  /* Destructor.  */
+  virtual ~gdb_disassemble_info ();
+
+  /* The stream that disassembler output is being written too.  */
   struct ui_file *stream ()
   { return (struct ui_file *) m_di.stream; }
 
-private:
-  struct gdbarch *m_gdbarch;
-
   /* Stores data required for disassembling instructions in
      opcodes.  */
   struct disassemble_info m_di;
 
+private:
+  /* The architecture we are disassembling for.  */
+  struct gdbarch *m_gdbarch;
+
   /* If we own the string in `m_di.disassembler_options', we do so
      using this field.  */
   std::string m_disassembler_options_holder;
+};
+
+/* A wrapper around gdb_disassemble_info.  This class adds default
+   print functions that are supplied to the disassemble_info within the
+   parent class.  These default print functions write to the stream, which
+   is also contained in the parent class.
+
+   As with the parent class, the constructor for this class is protected,
+   you should not create instances of this class, but create an
+   appropriate sub-class instead.  */
 
+struct gdb_printing_disassembler : public gdb_disassemble_info
+{
+  DISABLE_COPY_AND_ASSIGN (gdb_printing_disassembler);
+
+protected:
+
+  /* Constructor.  All the arguments are just passed to the parent class.
+     We also add the two print functions to the arguments passed to the
+     parent.  See gdb_disassemble_info for a description of how the
+     arguments are handled.  */
+  gdb_printing_disassembler (struct gdbarch *gdbarch,
+			     struct ui_file *stream,
+			     read_memory_ftype read_memory_func,
+			     memory_error_ftype memory_error_func,
+			     print_address_ftype print_address_func)
+    : gdb_disassemble_info (gdbarch, stream, read_memory_func,
+			    memory_error_func, print_address_func,
+			    fprintf_func, fprintf_styled_func)
+  { /* Nothing.  */ }
+
+  /* Callback used as the disassemble_info's fprintf_func callback, this
+     writes to STREAM, which will be m_di.stream.  */
+  static int fprintf_func (void *stream, const char *format, ...)
+    ATTRIBUTE_PRINTF(2,3);
+
+  /* Callback used as the disassemble_info's fprintf_styled_func callback,
+     this writes to STREAM, which will be m_di.stream.  */
+  static int fprintf_styled_func (void *stream,
+				  enum disassembler_style style,
+				  const char *format, ...)
+    ATTRIBUTE_PRINTF(3,4);
+};
+
+/* A dissassembler class that provides 'print_insn', a method for
+   disassembling a single instruction to the output stream.  */
+
+struct gdb_disassembler : public gdb_printing_disassembler
+{
+  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
+    : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
+  { /* Nothing.  */ }
+
+  DISABLE_COPY_AND_ASSIGN (gdb_disassembler);
+
+  /* Disassemble a single instruction at MEMADDR to the ui_file* that was
+     passed to the constructor.  If a memory error occurs while
+     disassembling this instruction then an error will be thrown.  */
+  int print_insn (CORE_ADDR memaddr, int *branch_delay_insns = NULL);
+
+protected:
+  gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file,
+		    read_memory_ftype func);
+
+private:
   /* This member variable is given a value by calling dis_asm_memory_error.
      If after calling into the libopcodes disassembler we get back a
      negative value (which indicates an error), then, if this variable has
@@ -95,16 +189,6 @@ class gdb_disassembler
      (currently just to addresses and symbols) as it goes.  */
   static bool use_ext_lang_colorization_p;
 
-  static int dis_asm_fprintf (void *stream, const char *format, ...)
-    ATTRIBUTE_PRINTF(2,3);
-
-  /* Print formatted message to STREAM, the content can be styled based on
-     STYLE if desired.  */
-  static int dis_asm_styled_fprintf (void *stream,
-				     enum disassembler_style style,
-				     const char *format, ...)
-    ATTRIBUTE_PRINTF(3,4);
-
   static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
 				  unsigned int len,
 				  struct disassemble_info *info);
diff --git a/gdb/mips-tdep.c b/gdb/mips-tdep.c
index 805c5beba59..65aa86dd98d 100644
--- a/gdb/mips-tdep.c
+++ b/gdb/mips-tdep.c
@@ -7021,8 +7021,8 @@ reinit_frame_cache_sfunc (const char *args, int from_tty,
 static int
 gdb_print_insn_mips (bfd_vma memaddr, struct disassemble_info *info)
 {
-  gdb_disassembler *di
-    = static_cast<gdb_disassembler *>(info->application_data);
+  gdb_disassemble_info *di
+    = static_cast<gdb_disassemble_info *> (info->application_data);
   struct gdbarch *gdbarch = di->arch ();
 
   /* FIXME: cagney/2003-06-26: Is this even necessary?  The
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PUSHED 3/6] gdb: add extension language print_insn hook
  2022-06-15  9:04           ` [PUSHED 0/6] Add Python API for the disassembler Andrew Burgess
  2022-06-15  9:04             ` [PUSHED 1/6] gdb/python: convert gdbpy_err_fetch to use gdbpy_ref Andrew Burgess
  2022-06-15  9:04             ` [PUSHED 2/6] gdb: add new base class to gdb_disassembler Andrew Burgess
@ 2022-06-15  9:04             ` Andrew Burgess
  2022-06-15  9:04             ` [PUSHED 4/6] gdb/python: implement the print_insn extension language hook Andrew Burgess
                               ` (2 subsequent siblings)
  5 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-06-15  9:04 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

This commit is setup for the next commit.

In the next commit I will add a Python API to intercept the print_insn
calls within GDB, each print_insn call is responsible for
disassembling, and printing one instruction.  After the next commit it
will be possible for a user to write Python code that either wraps
around the existing disassembler, or even, in extreme situations,
entirely replaces the existing disassembler.

This commit does not add any new Python API.

What this commit does is put the extension language framework in place
for a print_insn hook.  There's a new callback added to 'struct
extension_language_ops', which is then filled in with nullptr for Python
and Guile.

Finally, in the disassembler, the code is restructured so that the new
extension language function ext_lang_print_insn is called before we
delegate to gdbarch_print_insn.

After this, the next commit can focus entirely on providing a Python
implementation of the new print_insn callback.

There should be no user visible change after this commit.
---
 gdb/disasm.c         | 29 ++++++++++++++++++++++++++---
 gdb/extension-priv.h | 15 +++++++++++++++
 gdb/extension.c      | 20 ++++++++++++++++++++
 gdb/extension.h      | 10 ++++++++++
 gdb/guile/guile.c    |  6 +++++-
 gdb/python/python.c  |  2 ++
 6 files changed, 78 insertions(+), 4 deletions(-)

diff --git a/gdb/disasm.c b/gdb/disasm.c
index 6ac84388cc3..4af40c916b2 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -851,6 +851,29 @@ gdb_disassemble_info::~gdb_disassemble_info ()
   disassemble_free_target (&m_di);
 }
 
+/* Wrapper around calling gdbarch_print_insn.  This function takes care of
+   first calling the extension language hooks for print_insn, and, if none
+   of the extension languages can print this instruction, calls
+   gdbarch_print_insn to do the work.
+
+   GDBARCH is the architecture to disassemble in, VMA is the address of the
+   instruction being disassembled, and INFO is the libopcodes disassembler
+   related information.  */
+
+static int
+gdb_print_insn_1 (struct gdbarch *gdbarch, CORE_ADDR vma,
+		  struct disassemble_info *info)
+{
+  /* Call into the extension languages to do the disassembly.  */
+  gdb::optional<int> length = ext_lang_print_insn (gdbarch, vma, info);
+  if (length.has_value ())
+    return *length;
+
+  /* No extension language wanted to do the disassembly, so do it
+     manually.  */
+  return gdbarch_print_insn (gdbarch, vma, info);
+}
+
 /* See disasm.h.  */
 
 bool gdb_disassembler::use_ext_lang_colorization_p = true;
@@ -864,7 +887,7 @@ gdb_disassembler::print_insn (CORE_ADDR memaddr,
   m_err_memaddr.reset ();
   m_buffer.clear ();
 
-  int length = gdbarch_print_insn (arch (), memaddr, &m_di);
+  int length = gdb_print_insn_1 (arch (), memaddr, &m_di);
 
   /* If we have successfully disassembled an instruction, styling is on, we
      think that the extension language might be able to perform styling for
@@ -899,7 +922,7 @@ gdb_disassembler::print_insn (CORE_ADDR memaddr,
 	  gdb_assert (!m_buffer.term_out ());
 	  m_buffer.~string_file ();
 	  new (&m_buffer) string_file (true);
-	  length = gdbarch_print_insn (arch (), memaddr, &m_di);
+	  length = gdb_print_insn_1 (arch (), memaddr, &m_di);
 	  gdb_assert (length > 0);
 	}
     }
@@ -1054,7 +1077,7 @@ gdb_buffered_insn_length (struct gdbarch *gdbarch,
   gdb_buffered_insn_length_init_dis (gdbarch, &di, insn, max_len, addr,
 				     &disassembler_options_holder);
 
-  int result = gdbarch_print_insn (gdbarch, addr, &di);
+  int result = gdb_print_insn_1 (gdbarch, addr, &di);
   disassemble_free_target (&di);
   return result;
 }
diff --git a/gdb/extension-priv.h b/gdb/extension-priv.h
index d9450b51231..7c74e721c57 100644
--- a/gdb/extension-priv.h
+++ b/gdb/extension-priv.h
@@ -263,6 +263,21 @@ struct extension_language_ops
      contents, or an empty optional.  */
   gdb::optional<std::string> (*colorize_disasm) (const std::string &content,
 						 gdbarch *gdbarch);
+
+  /* Print a single instruction from ADDRESS in architecture GDBARCH.  INFO
+     is the standard libopcodes disassembler_info structure.  Bytes for the
+     instruction being printed should be read using INFO->read_memory_func
+     as the actual instruction bytes might be in a buffer.
+
+     Use INFO->fprintf_func to print the results of the disassembly, and
+     return the length of the instruction.
+
+     If no instruction can be disassembled then return an empty value and
+     other extension languages will get a chance to perform the
+     disassembly.  */
+  gdb::optional<int> (*print_insn) (struct gdbarch *gdbarch,
+				    CORE_ADDR address,
+				    struct disassemble_info *info);
 };
 
 /* State necessary to restore a signal handler to its previous value.  */
diff --git a/gdb/extension.c b/gdb/extension.c
index 8f39b86e952..5a805bea00e 100644
--- a/gdb/extension.c
+++ b/gdb/extension.c
@@ -924,6 +924,26 @@ ext_lang_colorize_disasm (const std::string &content, gdbarch *gdbarch)
   return result;
 }
 
+/* See extension.h.  */
+
+gdb::optional<int>
+ext_lang_print_insn (struct gdbarch *gdbarch, CORE_ADDR address,
+		     struct disassemble_info *info)
+{
+  for (const struct extension_language_defn *extlang : extension_languages)
+    {
+      if (extlang->ops == nullptr
+	  || extlang->ops->print_insn == nullptr)
+	continue;
+      gdb::optional<int> length
+	= extlang->ops->print_insn (gdbarch, address, info);
+      if (length.has_value ())
+	return length;
+    }
+
+  return {};
+}
+
 /* Called via an observer before gdb prints its prompt.
    Iterate over the extension languages giving them a chance to
    change the prompt.  The first one to change the prompt wins,
diff --git a/gdb/extension.h b/gdb/extension.h
index 7eb89530c44..47839ea50be 100644
--- a/gdb/extension.h
+++ b/gdb/extension.h
@@ -327,6 +327,16 @@ extern gdb::optional<std::string> ext_lang_colorize
 extern gdb::optional<std::string> ext_lang_colorize_disasm
   (const std::string &content, gdbarch *gdbarch);
 
+/* Calls extension_language_ops::print_insn for each extension language,
+   returning the result from the first extension language that returns a
+   non-empty result (any further extension languages are not then called).
+
+   All arguments are forwarded to extension_language_ops::print_insn, see
+   that function for a full description.  */
+
+extern gdb::optional<int> ext_lang_print_insn
+  (struct gdbarch *gdbarch, CORE_ADDR address, struct disassemble_info *info);
+
 #if GDB_SELF_TEST
 namespace selftests {
 extern void (*hook_set_active_ext_lang) ();
diff --git a/gdb/guile/guile.c b/gdb/guile/guile.c
index c7be48fb739..14b191ded62 100644
--- a/gdb/guile/guile.c
+++ b/gdb/guile/guile.c
@@ -130,8 +130,12 @@ static const struct extension_language_ops guile_extension_ops =
   gdbscm_breakpoint_has_cond,
   gdbscm_breakpoint_cond_says_stop,
 
-  NULL, /* gdbscm_check_quit_flag, */
   NULL, /* gdbscm_set_quit_flag, */
+  NULL, /* gdbscm_check_quit_flag, */
+  NULL, /* gdbscm_before_prompt, */
+  NULL, /* gdbscm_get_matching_xmethod_workers */
+  NULL, /* gdbscm_colorize */
+  NULL, /* gdbscm_print_insn */
 };
 #endif
 
diff --git a/gdb/python/python.c b/gdb/python/python.c
index 9bef2252e88..97de5f5cee5 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -166,6 +166,8 @@ static const struct extension_language_ops python_extension_ops =
   gdbpy_colorize,
 
   gdbpy_colorize_disasm,
+
+  NULL, /* gdbpy_print_insn, */
 };
 
 #endif /* HAVE_PYTHON */
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PUSHED 4/6] gdb/python: implement the print_insn extension language hook
  2022-06-15  9:04           ` [PUSHED 0/6] Add Python API for the disassembler Andrew Burgess
                               ` (2 preceding siblings ...)
  2022-06-15  9:04             ` [PUSHED 3/6] gdb: add extension language print_insn hook Andrew Burgess
@ 2022-06-15  9:04             ` Andrew Burgess
  2022-06-15  9:04             ` [PUSHED 5/6] gdb: refactor the non-printing disassemblers Andrew Burgess
  2022-06-15  9:04             ` [PUSHED 6/6] gdb: unify two dis_asm_read_memory functions in disasm.c Andrew Burgess
  5 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-06-15  9:04 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

From: Andrew Burgess <andrew.burgess@embecosm.com>

This commit extends the Python API to include disassembler support.

The motivation for this commit was to provide an API by which the user
could write Python scripts that would augment the output of the
disassembler.

To achieve this I have followed the model of the existing libopcodes
disassembler, that is, instructions are disassembled one by one.  This
does restrict the type of things that it is possible to do from a
Python script, i.e. all additional output has to fit on a single line,
but this was all I needed, and creating something more complex would,
I think, require greater changes to how GDB's internal disassembler
operates.

The disassembler API is contained in the new gdb.disassembler module,
which defines the following classes:

  DisassembleInfo

      Similar to libopcodes disassemble_info structure, has read-only
  properties: address, architecture, and progspace.  And has methods:
  __init__, read_memory, and is_valid.

      Each time GDB wants an instruction disassembled, an instance of
  this class is passed to a user written disassembler function, by
  reading the properties, and calling the methods (and other support
  methods in the gdb.disassembler module) the user can perform and
  return the disassembly.

  Disassembler

      This is a base-class which user written disassemblers should
  inherit from.  This base class provides base implementations of
  __init__ and __call__ which the user written disassembler should
  override.

  DisassemblerResult

      This class can be used to hold the result of a call to the
  disassembler, it's really just a wrapper around a string (the text
  of the disassembled instruction) and a length (in bytes).  The user
  can return an instance of this class from Disassembler.__call__ to
  represent the newly disassembled instruction.

The gdb.disassembler module also provides the following functions:

  register_disassembler

      This function registers an instance of a Disassembler sub-class
  as a disassembler, either for one specific architecture, or, as a
  global disassembler for all architectures.

  builtin_disassemble

      This provides access to GDB's builtin disassembler.  A common
  use case that I see is augmenting the existing disassembler output.
  The user code can call this function to have GDB disassemble the
  instruction in the normal way.  The user gets back a
  DisassemblerResult object, which they can then read in order to
  augment the disassembler output in any way they wish.

      This function also provides a mechanism to intercept the
  disassemblers reads of memory, thus the user can adjust what GDB
  sees when it is disassembling.

The included documentation provides a more detailed description of the
API.

There is also a new CLI command added:

  maint info python-disassemblers

This command is defined in the Python gdb.disassemblers module, and
can be used to list the currently registered Python disassemblers.
---
 gdb/Makefile.in                        |    1 +
 gdb/NEWS                               |   34 +
 gdb/data-directory/Makefile.in         |    1 +
 gdb/doc/gdb.texinfo                    |   45 +
 gdb/doc/python.texi                    |  328 +++++++
 gdb/python/lib/gdb/disassembler.py     |  178 ++++
 gdb/python/py-disasm.c                 | 1090 ++++++++++++++++++++++++
 gdb/python/python-internal.h           |   23 +
 gdb/python/python.c                    |    3 +-
 gdb/testsuite/gdb.python/py-disasm.c   |   25 +
 gdb/testsuite/gdb.python/py-disasm.exp |  209 +++++
 gdb/testsuite/gdb.python/py-disasm.py  |  712 ++++++++++++++++
 12 files changed, 2648 insertions(+), 1 deletion(-)
 create mode 100644 gdb/python/lib/gdb/disassembler.py
 create mode 100644 gdb/python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.c
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.exp
 create mode 100644 gdb/testsuite/gdb.python/py-disasm.py

diff --git a/gdb/Makefile.in b/gdb/Makefile.in
index d80087749de..911daa2607b 100644
--- a/gdb/Makefile.in
+++ b/gdb/Makefile.in
@@ -393,6 +393,7 @@ SUBDIR_PYTHON_SRCS = \
 	python/py-cmd.c \
 	python/py-connection.c \
 	python/py-continueevent.c \
+	python/py-disasm.c \
 	python/py-event.c \
 	python/py-evtregistry.c \
 	python/py-evts.c \
diff --git a/gdb/NEWS b/gdb/NEWS
index 0efcce7d3ef..ac9a1aacd34 100644
--- a/gdb/NEWS
+++ b/gdb/NEWS
@@ -63,6 +63,40 @@ maintenance info line-table
   ** New method gdb.Frame.language that returns the name of the
      frame's language.
 
+  ** New Python API for wrapping GDB's disassembler:
+
+     - gdb.disassembler.register_disassembler(DISASSEMBLER, ARCH).
+       DISASSEMBLER is a sub-class of gdb.disassembler.Disassembler.
+       ARCH is either None or a string containing a bfd architecture
+       name.  DISASSEMBLER is registered as a disassembler for
+       architecture ARCH, or for all architectures if ARCH is None.
+       The previous disassembler registered for ARCH is returned, this
+       can be None if no previous disassembler was registered.
+
+     - gdb.disassembler.Disassembler is the class from which all
+       disassemblers should inherit.  Its constructor takes a string,
+       a name for the disassembler, which is currently only used in
+       some debug output.  Sub-classes should override the __call__
+       method to perform disassembly, invoking __call__ on this base
+       class will raise an exception.
+
+     - gdb.disassembler.DisassembleInfo is the class used to describe
+       a single disassembly request from GDB.  An instance of this
+       class is passed to the __call__ method of
+       gdb.disassembler.Disassembler and has the following read-only
+       attributes: 'address', and 'architecture', as well as the
+       following method: 'read_memory'.
+
+     - gdb.disassembler.builtin_disassemble(INFO, MEMORY_SOURCE),
+       calls GDB's builtin disassembler on INFO, which is a
+       gdb.disassembler.DisassembleInfo object.  MEMORY_SOURCE is
+       optional, its default value is None.  If MEMORY_SOURCE is not
+       None then it must be an object that has a 'read_memory' method.
+
+     - gdb.disassembler.DisassemblerResult is a class that can be used
+       to wrap the result of a call to a Disassembler.  It has
+       read-only attributes 'length' and 'string'.
+
 *** Changes in GDB 12
 
 * DBX mode is deprecated, and will be removed in GDB 13
diff --git a/gdb/data-directory/Makefile.in b/gdb/data-directory/Makefile.in
index b606fc654b5..cf5226f3961 100644
--- a/gdb/data-directory/Makefile.in
+++ b/gdb/data-directory/Makefile.in
@@ -69,6 +69,7 @@ PYTHON_DIR = python
 PYTHON_INSTALL_DIR = $(DESTDIR)$(GDB_DATADIR)/$(PYTHON_DIR)
 PYTHON_FILE_LIST = \
 	gdb/__init__.py \
+	gdb/disassembler.py \
 	gdb/FrameDecorator.py \
 	gdb/FrameIterator.py \
 	gdb/frames.py \
diff --git a/gdb/doc/gdb.texinfo b/gdb/doc/gdb.texinfo
index 3a8cf3f1b6b..2178b476f53 100644
--- a/gdb/doc/gdb.texinfo
+++ b/gdb/doc/gdb.texinfo
@@ -39680,6 +39680,51 @@
 @item maint info jit
 Print information about JIT code objects loaded in the current inferior.
 
+@anchor{maint info python-disassemblers}
+@kindex maint info python-disassemblers
+@item maint info python-disassemblers
+This command is defined within the @code{gdb.disassembler} Python
+module (@pxref{Disassembly In Python}), and will only be present after
+that module has been imported.  To force the module to be imported do
+the following:
+
+@smallexample
+(@value{GDBP}) python import gdb.disassembler
+@end smallexample
+
+This command lists all the architectures for which a disassembler is
+currently registered, and the name of the disassembler.  If a
+disassembler is registered for all architectures, then this is listed
+last against the @samp{GLOBAL} architecture.
+
+If one of the disassemblers would be selected for the architecture of
+the current inferior, then this disassembler will be marked.
+
+The following example shows a situation in which two disassemblers are
+registered, initially the @samp{i386} disassembler matches the current
+architecture, then the architecture is changed, now the @samp{GLOBAL}
+disassembler matches.
+
+@smallexample
+@group
+(@value{GDBP}) show architecture
+The target architecture is set to "auto" (currently "i386").
+(@value{GDBP}) maint info python-disassemblers
+Architecture        Disassember Name
+i386                Disassembler_1	(Matches current architecture)
+GLOBAL              Disassembler_2
+@end group
+@group
+(@value{GDBP}) set architecture arm
+The target architecture is set to "arm".
+(@value{GDBP}) maint info python-disassemblers
+quit
+Architecture        Disassember Name
+i386                Disassembler_1
+GLOBAL              Disassembler_2	(Matches current architecture)
+@end group
+@end smallexample
+
 @kindex set displaced-stepping
 @kindex show displaced-stepping
 @cindex displaced stepping support
diff --git a/gdb/doc/python.texi b/gdb/doc/python.texi
index aaf7666e0be..75804ef975e 100644
--- a/gdb/doc/python.texi
+++ b/gdb/doc/python.texi
@@ -222,6 +222,7 @@
 * Registers In Python::         Python representation of registers.
 * Connections In Python::	Python representation of connections.
 * TUI Windows In Python::       Implementing new TUI windows.
+* Disassembly In Python::       Instruction Disassembly In Python
 @end menu
 
 @node Basic Python
@@ -599,6 +600,7 @@
 related prompts are prohibited from being changed.
 @end defun
 
+@anchor{gdb_architecture_names}
 @defun gdb.architecture_names ()
 Return a list containing all of the architecture names that the
 current build of @value{GDBN} supports.  Each architecture name is a
@@ -3287,6 +3289,7 @@
 particular frame (@pxref{Frames In Python}).
 @end defun
 
+@anchor{gdbpy_inferior_read_memory}
 @findex Inferior.read_memory
 @defun Inferior.read_memory (address, length)
 Read @var{length} addressable memory units from the inferior, starting at
@@ -6575,6 +6578,331 @@
 values can be 1 (left), 2 (middle), or 3 (right).
 @end defun
 
+@node Disassembly In Python
+@subsubsection Instruction Disassembly In Python
+@cindex python instruction disassembly
+
+@value{GDBN}'s builtin disassembler can be extended, or even replaced,
+using the Python API.  The disassembler related features are contained
+within the @code{gdb.disassembler} module:
+
+@deftp {class} gdb.disassembler.DisassembleInfo
+Disassembly is driven by instances of this class.  Each time
+@value{GDBN} needs to disassemble an instruction, an instance of this
+class is created and passed to a registered disassembler.  The
+disassembler is then responsible for disassembling an instruction and
+returning a result.
+
+Instances of this type are usually created within @value{GDBN},
+however, it is possible to create a copy of an instance of this type,
+see the description of @code{__init__} for more details.
+
+This class has the following properties and methods:
+
+@defvar DisassembleInfo.address
+A read-only integer containing the address at which @value{GDBN}
+wishes to disassemble a single instruction.
+@end defvar
+
+@defvar DisassembleInfo.architecture
+The @code{gdb.Architecture} (@pxref{Architectures In Python}) for
+which @value{GDBN} is currently disassembling, this property is
+read-only.
+@end defvar
+
+@defvar DisassembleInfo.progspace
+The @code{gdb.Progspace} (@pxref{Progspaces In Python,,Program Spaces
+In Python}) for which @value{GDBN} is currently disassembling, this
+property is read-only.
+@end defvar
+
+@defun DisassembleInfo.is_valid ()
+Returns @code{True} if the @code{DisassembleInfo} object is valid,
+@code{False} if not.  A @code{DisassembleInfo} object will become
+invalid once the disassembly call for which the @code{DisassembleInfo}
+was created, has returned.  Calling other @code{DisassembleInfo}
+methods, or accessing @code{DisassembleInfo} properties, will raise a
+@code{RuntimeError} exception if it is invalid.
+@end defun
+
+@defun DisassembleInfo.__init__ (info)
+This can be used to create a new @code{DisassembleInfo} object that is
+a copy of @var{info}.  The copy will have the same @code{address},
+@code{architecture}, and @code{progspace} values as @var{info}, and
+will become invalid at the same time as @var{info}.
+
+This method exists so that sub-classes of @code{DisassembleInfo} can
+be created, these sub-classes must be initialized as copies of an
+existing @code{DisassembleInfo} object, but sub-classes might choose
+to override the @code{read_memory} method, and so control what
+@value{GDBN} sees when reading from memory
+(@pxref{builtin_disassemble}).
+@end defun
+
+@defun DisassembleInfo.read_memory (length, offset)
+This method allows the disassembler to read the bytes of the
+instruction to be disassembled.  The method reads @var{length} bytes,
+starting at @var{offset} from
+@code{DisassembleInfo.address}.
+
+It is important that the disassembler read the instruction bytes using
+this method, rather than reading inferior memory directly, as in some
+cases @value{GDBN} disassembles from an internal buffer rather than
+directly from inferior memory, calling this method handles this
+detail.
+
+Returns a buffer object, which behaves much like an array or a string,
+just as @code{Inferior.read_memory} does
+(@pxref{gdbpy_inferior_read_memory,,Inferior.read_memory}).  The
+length of the returned buffer will always be exactly @var{length}.
+
+If @value{GDBN} is unable to read the required memory then a
+@code{gdb.MemoryError} exception is raised (@pxref{Exception
+Handling}).
+
+This method can be overridden by a sub-class in order to control what
+@value{GDBN} sees when reading from memory
+(@pxref{builtin_disassemble}).  When overriding this method it is
+important to understand how @code{builtin_disassemble} makes use of
+this method.
+
+While disassembling a single instruction there could be multiple calls
+to this method, and the same bytes might be read multiple times.  Any
+single call might only read a subset of the total instruction bytes.
+
+If an implementation of @code{read_memory} is unable to read the
+requested memory contents, for example, if there's a request to read
+from an invalid memory address, then a @code{gdb.MemoryError} should
+be raised.
+
+Raising a @code{MemoryError} inside @code{read_memory} does not
+automatically mean a @code{MemoryError} will be raised by
+@code{builtin_disassemble}.  It is possible the @value{GDBN}'s builtin
+disassembler is probing to see how many bytes are available.  When
+@code{read_memory} raises the @code{MemoryError} the builtin
+disassembler might be able to perform a complete disassembly with the
+bytes it has available, in this case @code{builtin_disassemble} will
+not itself raise a @code{MemoryError}.
+
+Any other exception type raised in @code{read_memory} will propagate
+back and be available re-raised by @code{builtin_disassemble}.
+@end defun
+@end deftp
+
+@deftp {class} Disassembler
+This is a base class from which all user implemented disassemblers
+must inherit.
+
+@defun Disassembler.__init__ (name)
+The constructor takes @var{name}, a string, which should be a short
+name for this disassembler.
+@end defun
+
+@defun Disassembler.__call__ (info)
+The @code{__call__} method must be overridden by sub-classes to
+perform disassembly.  Calling @code{__call__} on this base class will
+raise a @code{NotImplementedError} exception.
+
+The @var{info} argument is an instance of @code{DisassembleInfo}, and
+describes the instruction that @value{GDBN} wants disassembling.
+
+If this function returns @code{None}, this indicates to @value{GDBN}
+that this sub-class doesn't wish to disassemble the requested
+instruction.  @value{GDBN} will then use its builtin disassembler to
+perform the disassembly.
+
+Alternatively, this function can return a @code{DisassemblerResult}
+that represents the disassembled instruction, this type is described
+in more detail below.
+
+The @code{__call__} method can raise a @code{gdb.MemoryError}
+exception (@pxref{Exception Handling}) to indicate to @value{GDBN}
+that there was a problem accessing the required memory, this will then
+be displayed by @value{GDBN} within the disassembler output.
+
+Ideally, the only three outcomes from invoking @code{__call__} would
+be a return of @code{None}, a successful disassembly returned in a
+@code{DisassemblerResult}, or a @code{MemoryError} indicating that
+there was a problem reading memory.
+
+However, as an implementation of @code{__call__} could fail due to
+other reasons, e.g.@: some external resource required to perform
+disassembly is temporarily unavailable, then, if @code{__call__}
+raises a @code{GdbError}, the exception will be converted to a string
+and printed at the end of the disassembly output, the disassembly
+request will then stop.
+
+Any other exception type raised by the @code{__call__} method is
+considered an error in the user code, the exception will be printed to
+the error stream according to the @kbd{set python print-stack} setting
+(@pxref{set_python_print_stack,,@kbd{set python print-stack}}).
+@end defun
+@end deftp
+
+@deftp {class} DisassemblerResult
+This class is used to hold the result of calling
+@w{@code{Disassembler.__call__}}, and represents a single disassembled
+instruction.  This class has the following properties and methods:
+
+@defun DisassemblerResult.__init__ (@var{length}, @var{string})
+Initialize an instance of this class, @var{length} is the length of
+the disassembled instruction in bytes, which must be greater than
+zero, and @var{string} is a non-empty string that represents the
+disassembled instruction.
+@end defun
+
+@defvar DisassemblerResult.length
+A read-only property containing the length of the disassembled
+instruction in bytes, this will always be greater than zero.
+@end defvar
+
+@defvar DisassemblerResult.string
+A read-only property containing a non-empty string representing the
+disassembled instruction.
+@end defvar
+@end deftp
+
+The following functions are also contained in the
+@code{gdb.disassembler} module:
+
+@defun register_disassembler (disassembler, architecture)
+The @var{disassembler} must be a sub-class of
+@code{gdb.disassembler.Disassembler} or @code{None}.
+
+The optional @var{architecture} is either a string, or the value
+@code{None}.  If it is a string, then it should be the name of an
+architecture known to @value{GDBN}, as returned either from
+@code{gdb.Architecture.name}
+(@pxref{gdbpy_architecture_name,,gdb.Architecture.name}), or from
+@code{gdb.architecture_names}
+(@pxref{gdb_architecture_names,,gdb.architecture_names}).
+
+The @var{disassembler} will be installed for the architecture named by
+@var{architecture}, or if @var{architecture} is @code{None}, then
+@var{disassembler} will be installed as a global disassembler for use
+by all architectures.
+
+@cindex disassembler in Python, global vs.@: specific
+@cindex search order for disassembler in Python
+@cindex look up of disassembler in Python
+@value{GDBN} only records a single disassembler for each architecture,
+and a single global disassembler.  Calling
+@code{register_disassembler} for an architecture, or for the global
+disassembler, will replace any existing disassembler registered for
+that @var{architecture} value.  The previous disassembler is returned.
+
+If @var{disassembler} is @code{None} then any disassembler currently
+registered for @var{architecture} is deregistered and returned.
+
+When @value{GDBN} is looking for a disassembler to use, @value{GDBN}
+first looks for an architecture specific disassembler.  If none has
+been registered then @value{GDBN} looks for a global disassembler (one
+registered with @var{architecture} set to @code{None}).  Only one
+disassembler is called to perform disassembly, so, if there is both an
+architecture specific disassembler, and a global disassembler
+registered, it is the architecture specific disassembler that will be
+used.
+
+@value{GDBN} tracks the architecture specific, and global
+disassemblers separately, so it doesn't matter in which order
+disassemblers are created or registered; an architecture specific
+disassembler, if present, will always be used in preference to a
+global disassembler.
+
+You can use the @kbd{maint info python-disassemblers} command
+(@pxref{maint info python-disassemblers}) to see which disassemblers
+have been registered.
+@end defun
+
+@anchor{builtin_disassemble}
+@defun builtin_disassemble (info)
+This function calls back into @value{GDBN}'s builtin disassembler to
+disassemble the instruction identified by @var{info}, an instance, or
+sub-class, of @code{DisassembleInfo}.
+
+When the builtin disassembler needs to read memory the
+@code{read_memory} method on @var{info} will be called.  By
+sub-classing @code{DisassembleInfo} and overriding the
+@code{read_memory} method, it is possible to intercept calls to
+@code{read_memory} from the builtin disassembler, and to modify the
+values returned.
+
+It is important to understand that, even when
+@code{DisassembleInfo.read_memory} raises a @code{gdb.MemoryError}, it
+is the internal disassembler itself that reports the memory error to
+@value{GDBN}.  The reason for this is that the disassembler might
+probe memory to see if a byte is readable or not; if the byte can't be
+read then the disassembler may choose not to report an error, but
+instead to disassemble the bytes that it does have available.
+
+If the builtin disassembler is successful then an instance of
+@code{DisassemblerResult} is returned from @code{builtin_disassemble},
+alternatively, if something goes wrong, an exception will be raised.
+
+A @code{MemoryError} will be raised if @code{builtin_disassemble} is
+unable to read some memory that is required in order to perform
+disassembly correctly.
+
+Any exception that is not a @code{MemoryError}, that is raised in a
+call to @code{read_memory}, will pass through
+@code{builtin_disassemble}, and be visible to the caller.
+
+Finally, there are a few cases where @value{GDBN}'s builtin
+disassembler can fail for reasons that are not covered by
+@code{MemoryError}.  In these cases, a @code{GdbError} will be raised.
+The contents of the exception will be a string describing the problem
+the disassembler encountered.
+@end defun
+
+Here is an example that registers a global disassembler.  The new
+disassembler invokes the builtin disassembler, and then adds a
+comment, @code{## Comment}, to each line of disassembly output:
+
+@smallexample
+class ExampleDisassembler(gdb.disassembler.Disassembler):
+    def __init__(self):
+        super().__init__("ExampleDisassembler")
+
+    def __call__(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        length = result.length
+        text = result.string + "\t## Comment"
+        return gdb.disassembler.DisassemblerResult(length, text)
+
+gdb.disassembler.register_disassembler(ExampleDisassembler())
+@end smallexample
+
+The following example creates a sub-class of @code{DisassembleInfo} in
+order to intercept the @code{read_memory} calls, within
+@code{read_memory} any bytes read from memory have the two 4-bit
+nibbles swapped around.  This isn't a very useful adjustment, but
+serves as an example.
+
+@smallexample
+class MyInfo(gdb.disassembler.DisassembleInfo):
+    def __init__(self, info):
+        super().__init__(info)
+
+    def read_memory(self, length, offset):
+        buffer = super().read_memory(length, offset)
+        result = bytearray()
+        for b in buffer:
+            v = int.from_bytes(b, 'little')
+            v = (v << 4) & 0xf0 | (v >> 4)
+            result.append(v)
+        return memoryview(result)
+
+class NibbleSwapDisassembler(gdb.disassembler.Disassembler):
+    def __init__(self):
+        super().__init__("NibbleSwapDisassembler")
+
+    def __call__(self, info):
+        info = MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+gdb.disassembler.register_disassembler(NibbleSwapDisassembler())
+@end smallexample
+
 @node Python Auto-loading
 @subsection Python Auto-loading
 @cindex Python auto-loading
diff --git a/gdb/python/lib/gdb/disassembler.py b/gdb/python/lib/gdb/disassembler.py
new file mode 100644
index 00000000000..5a2d94a5fac
--- /dev/null
+++ b/gdb/python/lib/gdb/disassembler.py
@@ -0,0 +1,178 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+"""Disassembler related module."""
+
+import gdb
+import _gdb.disassembler
+
+# Re-export everything from the _gdb.disassembler module, which is
+# defined within GDB's C++ code.
+from _gdb.disassembler import *
+
+# Module global dictionary of gdb.disassembler.Disassembler objects.
+# The keys of this dictionary are bfd architecture names, or the
+# special value None.
+#
+# When a request to disassemble comes in we first lookup the bfd
+# architecture name from the gdbarch, if that name exists in this
+# dictionary then we use that Disassembler object.
+#
+# If there's no architecture specific disassembler then we look for
+# the key None in this dictionary, and if that key exists, we use that
+# disassembler.
+#
+# If none of the above checks found a suitable disassembler, then no
+# disassembly is performed in Python.
+_disassemblers_dict = {}
+
+
+class Disassembler(object):
+    """A base class from which all user implemented disassemblers must
+    inherit."""
+
+    def __init__(self, name):
+        """Constructor.  Takes a name, which should be a string, which can be
+        used to identify this disassembler in diagnostic messages."""
+        self.name = name
+
+    def __call__(self, info):
+        """A default implementation of __call__.  All sub-classes must
+        override this method.  Calling this default implementation will throw
+        a NotImplementedError exception."""
+        raise NotImplementedError("Disassembler.__call__")
+
+
+def register_disassembler(disassembler, architecture=None):
+    """Register a disassembler.  DISASSEMBLER is a sub-class of
+    gdb.disassembler.Disassembler.  ARCHITECTURE is either None or a
+    string, the name of an architecture known to GDB.
+
+    DISASSEMBLER is registered as a disassembler for ARCHITECTURE, or
+    all architectures when ARCHITECTURE is None.
+
+    Returns the previous disassembler registered with this
+    ARCHITECTURE value.
+    """
+
+    if not isinstance(disassembler, Disassembler) and disassembler is not None:
+        raise TypeError("disassembler should sub-class gdb.disassembler.Disassembler")
+
+    old = None
+    if architecture in _disassemblers_dict:
+        old = _disassemblers_dict[architecture]
+        del _disassemblers_dict[architecture]
+    if disassembler is not None:
+        _disassemblers_dict[architecture] = disassembler
+
+    # Call the private _set_enabled function within the
+    # _gdb.disassembler module.  This function sets a global flag
+    # within GDB's C++ code that enables or dissables the Python
+    # disassembler functionality, this improves performance of the
+    # disassembler by avoiding unneeded calls into Python when we know
+    # that no disassemblers are registered.
+    _gdb.disassembler._set_enabled(len(_disassemblers_dict) > 0)
+    return old
+
+
+def _print_insn(info):
+    """This function is called by GDB when it wants to disassemble an
+    instruction.  INFO describes the instruction to be
+    disassembled."""
+
+    def lookup_disassembler(arch):
+        try:
+            name = arch.name()
+            if name is None:
+                return None
+            if name in _disassemblers_dict:
+                return _disassemblers_dict[name]
+            if None in _disassemblers_dict:
+                return _disassemblers_dict[None]
+            return None
+        except:
+            # It's pretty unlikely this exception case will ever
+            # trigger, one situation would be if the user somehow
+            # corrupted the _disassemblers_dict variable such that it
+            # was no longer a dictionary.
+            return None
+
+    disassembler = lookup_disassembler(info.architecture)
+    if disassembler is None:
+        return None
+    return disassembler(info)
+
+
+class maint_info_py_disassemblers_cmd(gdb.Command):
+    """
+    List all registered Python disassemblers.
+
+    List the name of all registered Python disassemblers, next to the
+    name of the architecture for which the disassembler is registered.
+
+    The global Python disassembler is listed next to the string
+    'GLOBAL'.
+
+    The disassembler that matches the architecture of the currently
+    selected inferior will be marked, this is an indication of which
+    disassembler will be invoked if any disassembly is performed in
+    the current inferior.
+    """
+
+    def __init__(self):
+        super().__init__("maintenance info python-disassemblers", gdb.COMMAND_USER)
+
+    def invoke(self, args, from_tty):
+        # If no disassemblers are registered, tell the user.
+        if len(_disassemblers_dict) == 0:
+            print("No Python disassemblers registered.")
+            return
+
+        # Figure out the longest architecture name, so we can
+        # correctly format the table of results.
+        longest_arch_name = 0
+        for architecture in _disassemblers_dict:
+            if architecture is not None:
+                name = _disassemblers_dict[architecture].name
+                if len(name) > longest_arch_name:
+                    longest_arch_name = len(name)
+
+        # Figure out the name of the current architecture.  There
+        # should always be a current inferior, but if, somehow, there
+        # isn't, then leave curr_arch as the empty string, which will
+        # not then match agaisnt any architecture in the dictionary.
+        curr_arch = ""
+        if gdb.selected_inferior() is not None:
+            curr_arch = gdb.selected_inferior().architecture().name()
+
+        # Now print the dictionary of registered disassemblers out to
+        # the user.
+        match_tag = "\t(Matches current architecture)"
+        fmt_len = max(longest_arch_name, len("Architecture"))
+        format_string = "{:" + str(fmt_len) + "s} {:s}"
+        print(format_string.format("Architecture", "Disassember Name"))
+        for architecture in _disassemblers_dict:
+            if architecture is not None:
+                name = _disassemblers_dict[architecture].name
+                if architecture == curr_arch:
+                    name += match_tag
+                    match_tag = ""
+                print(format_string.format(architecture, name))
+        if None in _disassemblers_dict:
+            name = _disassemblers_dict[None].name + match_tag
+            print(format_string.format("GLOBAL", name))
+
+
+maint_info_py_disassemblers_cmd()
diff --git a/gdb/python/py-disasm.c b/gdb/python/py-disasm.c
new file mode 100644
index 00000000000..4c78ca350c2
--- /dev/null
+++ b/gdb/python/py-disasm.c
@@ -0,0 +1,1090 @@
+/* Python interface to instruction disassembly.
+
+   Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+   This file is part of GDB.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+#include "defs.h"
+#include "python-internal.h"
+#include "dis-asm.h"
+#include "arch-utils.h"
+#include "charset.h"
+#include "disasm.h"
+#include "progspace.h"
+
+/* Implement gdb.disassembler.DisassembleInfo type.  An object of this type
+   represents a single disassembler request from GDB.  */
+
+struct disasm_info_object
+{
+  PyObject_HEAD
+
+  /* The architecture in which we are disassembling.  */
+  struct gdbarch *gdbarch;
+
+  /* The program_space in which we are disassembling.  */
+  struct program_space *program_space;
+
+  /* Address of the instruction to disassemble.  */
+  bfd_vma address;
+
+  /* The disassemble_info passed from core GDB, this contains the
+     callbacks necessary to read the instruction from core GDB, and to
+     print the disassembled instruction.  */
+  disassemble_info *gdb_info;
+
+  /* If copies of this object are created then they are chained together
+     via this NEXT pointer, this allows all the copies to be invalidated at
+     the same time as the parent object.  */
+  struct disasm_info_object *next;
+};
+
+extern PyTypeObject disasm_info_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_info_object");
+
+/* Implement gdb.disassembler.DisassemblerResult type, an object that holds
+   the result of calling the disassembler.  This is mostly the length of
+   the disassembled instruction (in bytes), and the string representing the
+   disassembled instruction.  */
+
+struct disasm_result_object
+{
+  PyObject_HEAD
+
+  /* The length of the disassembled instruction in bytes.  */
+  int length;
+
+  /* A buffer which, when allocated, holds the disassembled content of an
+     instruction.  */
+  string_file *content;
+};
+
+extern PyTypeObject disasm_result_object_type
+    CPYCHECKER_TYPE_OBJECT_FOR_TYPEDEF ("disasm_result_object");
+
+/* When this is false we fast path out of gdbpy_print_insn, which should
+   keep the performance impact of the Python disassembler down.  This is
+   set to true from Python by calling gdb.disassembler._set_enabled() when
+   the user registers a disassembler.  */
+
+static bool python_print_insn_enabled = false;
+
+/* A sub-class of gdb_disassembler that holds a pointer to a Python
+   DisassembleInfo object.  A pointer to an instance of this class is
+   placed in the application_data field of the disassemble_info that is
+   used when we call gdbarch_print_insn.  */
+
+struct gdbpy_disassembler : public gdb_printing_disassembler
+{
+  /* Constructor.  */
+  gdbpy_disassembler (disasm_info_object *obj, PyObject *memory_source);
+
+  /* Get the DisassembleInfo object pointer.  */
+  disasm_info_object *
+  py_disasm_info () const
+  {
+    return m_disasm_info_object;
+  }
+
+  /* Callbacks used by disassemble_info.  */
+  static void memory_error_func (int status, bfd_vma memaddr,
+				 struct disassemble_info *info);
+  static void print_address_func (bfd_vma addr,
+				  struct disassemble_info *info);
+  static int read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+			       unsigned int len,
+			       struct disassemble_info *info);
+
+  /* Return a reference to an optional that contains the address at which a
+     memory error occurred.  The optional will only have a value if a
+     memory error actually occurred.  */
+  const gdb::optional<CORE_ADDR> &memory_error_address () const
+  { return m_memory_error_address; }
+
+  /* Return the content of the disassembler as a string.  The contents are
+     moved out of the disassembler, so after this call the disassembler
+     contents have been reset back to empty.  */
+  std::string release ()
+  {
+    return m_string_file.release ();
+  }
+
+private:
+
+  /* Where the disassembler result is written.  */
+  string_file m_string_file;
+
+  /* The DisassembleInfo object we are disassembling for.  */
+  disasm_info_object *m_disasm_info_object;
+
+  /* When the user indicates that a memory error has occurred then the
+     address of the memory error is stored in here.  */
+  gdb::optional<CORE_ADDR> m_memory_error_address;
+
+  /* When the user calls the builtin_disassemble function, if they pass a
+     memory source object then a pointer to the object is placed in here,
+     otherwise, this field is nullptr.  */
+  PyObject *m_memory_source;
+};
+
+/* Return true if OBJ is still valid, otherwise, return false.  A valid OBJ
+   will have a non-nullptr gdb_info field.  */
+
+static bool
+disasm_info_object_is_valid (disasm_info_object *obj)
+{
+  return obj->gdb_info != nullptr;
+}
+
+/* Fill in OBJ with all the other arguments.  */
+
+static void
+disasm_info_fill (disasm_info_object *obj, struct gdbarch *gdbarch,
+		  program_space *progspace, bfd_vma address,
+		  disassemble_info *di, disasm_info_object *next)
+{
+  obj->gdbarch = gdbarch;
+  obj->program_space = progspace;
+  obj->address = address;
+  obj->gdb_info = di;
+  obj->next = next;
+}
+
+/* Implement DisassembleInfo.__init__.  Takes a single argument that must
+   be another DisassembleInfo object and copies the contents from the
+   argument into this new object.  */
+
+static int
+disasm_info_init (PyObject *self, PyObject *args, PyObject *kwargs)
+{
+  static const char *keywords[] = { "info", NULL };
+  PyObject *info_obj;
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kwargs, "O!", keywords,
+					&disasm_info_object_type,
+					&info_obj))
+    return -1;
+
+  disasm_info_object *other = (disasm_info_object *) info_obj;
+  disasm_info_object *info = (disasm_info_object *) self;
+  disasm_info_fill (info, other->gdbarch, other->program_space,
+		    other->address, other->gdb_info, other->next);
+  other->next = info;
+
+  /* As the OTHER object now holds a pointer to INFO we inc the ref count
+     on INFO.  This stops INFO being deleted until OTHER has gone away.  */
+  Py_INCREF ((PyObject *) info);
+  return 0;
+}
+
+/* The tp_dealloc callback for the DisassembleInfo type.  */
+
+static void
+disasm_info_dealloc (PyObject *self)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+
+  /* We no longer care about the object our NEXT pointer points at, so we
+     can decrement its reference count.  This macro handles the case when
+     NEXT is nullptr.  */
+  Py_XDECREF ((PyObject *) obj->next);
+
+  /* Now core deallocation behaviour.  */
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* Implement DisassembleInfo.is_valid(), really just a wrapper around the
+   disasm_info_object_is_valid function above.  */
+
+static PyObject *
+disasmpy_info_is_valid (PyObject *self, PyObject *args)
+{
+  disasm_info_object *disasm_obj = (disasm_info_object *) self;
+
+  if (disasm_info_object_is_valid (disasm_obj))
+    Py_RETURN_TRUE;
+
+  Py_RETURN_FALSE;
+}
+
+/* Set the Python exception to be a gdb.MemoryError object, with ADDRESS
+   as its payload.  */
+
+static void
+disasmpy_set_memory_error_for_address (CORE_ADDR address)
+{
+  PyObject *address_obj = gdb_py_object_from_longest (address).release ();
+  PyErr_SetObject (gdbpy_gdb_memory_error, address_obj);
+}
+
+/* Ensure that a gdb.disassembler.DisassembleInfo is valid.  */
+
+#define DISASMPY_DISASM_INFO_REQUIRE_VALID(Info)			\
+  do {									\
+    if (!disasm_info_object_is_valid (Info))				\
+      {									\
+	PyErr_SetString (PyExc_RuntimeError,				\
+			 _("DisassembleInfo is no longer valid."));	\
+	return nullptr;							\
+      }									\
+  } while (0)
+
+/* Initialise OBJ, a DisassemblerResult object with LENGTH and CONTENT.
+   OBJ might already have been initialised, in which case any existing
+   content should be discarded before the new CONTENT is moved in.  */
+
+static void
+disasmpy_init_disassembler_result (disasm_result_object *obj, int length,
+				   std::string content)
+{
+  if (obj->content == nullptr)
+    obj->content = new string_file;
+  else
+    obj->content->clear ();
+
+  obj->length = length;
+  *(obj->content) = std::move (content);
+}
+
+/* Implement gdb.disassembler.builtin_disassemble().  Calls back into GDB's
+   builtin disassembler.  The first argument is a DisassembleInfo object
+   describing what to disassemble.  The second argument is optional and
+   provides a mechanism to modify the memory contents that the builtin
+   disassembler will actually disassemble.
+
+   Returns an instance of gdb.disassembler.DisassemblerResult, an object
+   that wraps a disassembled instruction, or it raises a
+   gdb.MemoryError.  */
+
+static PyObject *
+disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *info_obj, *memory_source_obj = nullptr;
+  static const char *keywords[] = { "info", "memory_source", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
+					&disasm_info_object_type, &info_obj,
+					&memory_source_obj))
+    return nullptr;
+
+  disasm_info_object *disasm_info = (disasm_info_object *) info_obj;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (disasm_info);
+
+  /* Where the result will be written.  */
+  gdbpy_disassembler disassembler (disasm_info, memory_source_obj);
+
+  /* Now actually perform the disassembly.  LENGTH is set to the length of
+     the disassembled instruction, or -1 if there was a memory-error
+     encountered while disassembling.  See below more more details on
+     handling of -1 return value.  */
+  int length;
+  try
+    {
+      length = gdbarch_print_insn (disasm_info->gdbarch, disasm_info->address,
+				   disassembler.disasm_info ());
+    }
+  catch (gdbpy_err_fetch &pyerr)
+    {
+      /* Reinstall the Python exception held in PYERR.  This clears to
+	 pointers held in PYERR, hence the need to catch as a non-const
+	 reference.  */
+      pyerr.restore ();
+      return nullptr;
+    }
+
+  if (length == -1)
+    {
+
+      /* In an ideal world, every disassembler should always call the
+	 memory error function before returning a status of -1 as the only
+	 error a disassembler should encounter is a failure to read
+	 memory.  Unfortunately, there are some disassemblers who don't
+	 follow this rule, and will return -1 without calling the memory
+	 error function.
+
+	 To make the Python API simpler, we just classify everything as a
+	 memory error, but the message has to be modified for the case
+	 where the disassembler didn't call the memory error function.  */
+      if (disassembler.memory_error_address ().has_value ())
+	{
+	  CORE_ADDR addr = *disassembler.memory_error_address ();
+	  disasmpy_set_memory_error_for_address (addr);
+	}
+      else
+	{
+	  std::string content = disassembler.release ();
+	  if (!content.empty ())
+	    PyErr_SetString (gdbpy_gdberror_exc, content.c_str ());
+	  else
+	    PyErr_SetString (gdbpy_gdberror_exc,
+			     _("Unknown disassembly error."));
+	}
+      return nullptr;
+    }
+
+  /* Instructions are either non-zero in length, or we got an error,
+     indicated by a length of -1, which we handled above.  */
+  gdb_assert (length > 0);
+
+  /* We should not have seen a memory error in this case.  */
+  gdb_assert (!disassembler.memory_error_address ().has_value ());
+
+  /* Create a DisassemblerResult containing the results.  */
+  std::string content = disassembler.release ();
+  PyTypeObject *type = &disasm_result_object_type;
+  gdbpy_ref<disasm_result_object> res
+    ((disasm_result_object *) type->tp_alloc (type, 0));
+  disasmpy_init_disassembler_result (res.get (), length, std::move (content));
+  return reinterpret_cast<PyObject *> (res.release ());
+}
+
+/* Implement gdb._set_enabled function.  Takes a boolean parameter, and
+   sets whether GDB should enter the Python disassembler code or not.
+
+   This is called from within the Python code when a new disassembler is
+   registered.  When no disassemblers are registered the global C++ flag
+   is set to false, and GDB never even enters the Python environment to
+   check for a disassembler.
+
+   When the user registers a new Python disassembler, the global C++ flag
+   is set to true, and now GDB will enter the Python environment to check
+   if there's a disassembler registered for the current architecture.  */
+
+static PyObject *
+disasmpy_set_enabled (PyObject *self, PyObject *args, PyObject *kw)
+{
+  PyObject *newstate;
+  static const char *keywords[] = { "state", nullptr };
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O", keywords,
+					&newstate))
+    return nullptr;
+
+  if (!PyBool_Check (newstate))
+    {
+      PyErr_SetString (PyExc_TypeError,
+		       _("The value passed to `_set_enabled' must be a boolean."));
+      return nullptr;
+    }
+
+  python_print_insn_enabled = PyObject_IsTrue (newstate);
+  Py_RETURN_NONE;
+}
+
+/* Implement DisassembleInfo.read_memory(LENGTH, OFFSET).  Read LENGTH
+   bytes at OFFSET from the start of the instruction currently being
+   disassembled, and return a memory buffer containing the bytes.
+
+   OFFSET defaults to zero if it is not provided.  LENGTH is required.  If
+   the read fails then this will raise a gdb.MemoryError exception.  */
+
+static PyObject *
+disasmpy_info_read_memory (PyObject *self, PyObject *args, PyObject *kw)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+
+  LONGEST length, offset = 0;
+  gdb::unique_xmalloc_ptr<gdb_byte> buffer;
+  static const char *keywords[] = { "length", "offset", nullptr };
+
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "L|L", keywords,
+					&length, &offset))
+    return nullptr;
+
+  /* The apparent address from which we are reading memory.  Note that in
+     some cases GDB actually disassembles instructions from a buffer, so
+     we might not actually be reading this information directly from the
+     inferior memory.  This is all hidden behind the read_memory_func API
+     within the disassemble_info structure.  */
+  CORE_ADDR address = obj->address + offset;
+
+  /* Setup a buffer to hold the result.  */
+  buffer.reset ((gdb_byte *) xmalloc (length));
+
+  /* Read content into BUFFER.  If the read fails then raise a memory
+     error, otherwise, convert BUFFER to a Python memory buffer, and return
+     it to the user.  */
+  disassemble_info *info = obj->gdb_info;
+  if (info->read_memory_func ((bfd_vma) address, buffer.get (),
+			      (unsigned int) length, info) != 0)
+    {
+      disasmpy_set_memory_error_for_address (address);
+      return nullptr;
+    }
+  return gdbpy_buffer_to_membuf (std::move (buffer), address, length);
+}
+
+/* Implement DisassembleInfo.address attribute, return the address at which
+   GDB would like an instruction disassembled.  */
+
+static PyObject *
+disasmpy_info_address (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdb_py_object_from_longest (obj->address).release ();
+}
+
+/* Implement DisassembleInfo.architecture attribute.  Return the
+   gdb.Architecture in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_architecture (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return gdbarch_to_arch_object (obj->gdbarch);
+}
+
+/* Implement DisassembleInfo.progspace attribute.  Return the
+   gdb.Progspace in which we are disassembling.  */
+
+static PyObject *
+disasmpy_info_progspace (PyObject *self, void *closure)
+{
+  disasm_info_object *obj = (disasm_info_object *) self;
+  DISASMPY_DISASM_INFO_REQUIRE_VALID (obj);
+  return pspace_to_pspace_object (obj->program_space).release ();
+}
+
+/* This implements the disassemble_info read_memory_func callback and is
+   called from the libopcodes disassembler when the disassembler wants to
+   read memory.
+
+   From the INFO argument we can find the gdbpy_disassembler object for
+   which we are disassembling, and from that object we can find the
+   DisassembleInfo for the current disassembly call.
+
+   This function reads the instruction bytes by calling the read_memory
+   method on the DisassembleInfo object.  This method might have been
+   overridden by user code.
+
+   Read LEN bytes from MEMADDR and place them into BUFF.  Return 0 on
+   success (in which case BUFF has been filled), or -1 on error, in which
+   case the contents of BUFF are undefined.  */
+
+int
+gdbpy_disassembler::read_memory_func (bfd_vma memaddr, gdb_byte *buff,
+				      unsigned int len,
+				      struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  disasm_info_object *obj = dis->py_disasm_info ();
+
+  /* The DisassembleInfo.read_memory method expects an offset from the
+     address stored within the DisassembleInfo object; calculate that
+     offset here.  */
+  LONGEST offset = (LONGEST) memaddr - (LONGEST) obj->address;
+
+  /* Now call the DisassembleInfo.read_memory method.  This might have been
+     overridden by the user.  */
+  gdbpy_ref<> result_obj (PyObject_CallMethod ((PyObject *) obj,
+					       "read_memory",
+					       "KL", len, offset));
+
+  /* Handle any exceptions.  */
+  if (result_obj == nullptr)
+    {
+      /* If we got a gdb.MemoryError then we ignore this and just report
+	 that the read failed to the caller.  The caller is then
+	 responsible for calling the memory_error_func if it wants to.
+	 Remember, the disassembler might just be probing to see if these
+	 bytes can be read, if we automatically call the memory error
+	 function, we can end up registering an error prematurely.  */
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	{
+	  PyErr_Clear ();
+	  return -1;
+	}
+
+      /* For any other exception type we capture the value of the Python
+	 exception and throw it, this will then be caught in
+	 disasmpy_builtin_disassemble, at which point the exception will be
+	 restored.  */
+      throw gdbpy_err_fetch ();
+    }
+
+  /* Convert the result to a buffer.  */
+  Py_buffer py_buff;
+  if (!PyObject_CheckBuffer (result_obj.get ())
+      || PyObject_GetBuffer (result_obj.get(), &py_buff, PyBUF_CONTIG_RO) < 0)
+    {
+      PyErr_Format (PyExc_TypeError,
+		    _("Result from read_memory is not a buffer"));
+      throw gdbpy_err_fetch ();
+    }
+
+  /* Wrap PY_BUFF so that it is cleaned up correctly at the end of this
+     scope.  */
+  Py_buffer_up buffer_up (&py_buff);
+
+  /* Validate that the buffer is the correct length.  */
+  if (py_buff.len != len)
+    {
+      PyErr_Format (PyExc_ValueError,
+		    _("Buffer returned from read_memory is sized %d instead of the expected %d"),
+		    py_buff.len, len);
+      throw gdbpy_err_fetch ();
+    }
+
+  /* Copy the data out of the Python buffer and return success.  */
+  const gdb_byte *buffer = (const gdb_byte *) py_buff.buf;
+  memcpy (buff, buffer, len);
+  return 0;
+}
+
+/* Implement DisassemblerResult.length attribute, return the length of the
+   disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_length (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  return gdb_py_object_from_longest (obj->length).release ();
+}
+
+/* Implement DisassemblerResult.string attribute, return the content string
+   of the disassembled instruction.  */
+
+static PyObject *
+disasmpy_result_string (PyObject *self, void *closure)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+
+  gdb_assert (obj->content != nullptr);
+  gdb_assert (strlen (obj->content->c_str ()) > 0);
+  gdb_assert (obj->length > 0);
+  return PyUnicode_Decode (obj->content->c_str (),
+			   obj->content->size (),
+			   host_charset (), nullptr);
+}
+
+/* Implement DisassemblerResult.__init__.  Takes two arguments, an
+   integer, the length in bytes of the disassembled instruction, and a
+   string, the disassembled content of the instruction.  */
+
+static int
+disasmpy_result_init (PyObject *self, PyObject *args, PyObject *kwargs)
+{
+  static const char *keywords[] = { "length", "string", NULL };
+  int length;
+  const char *string;
+  if (!gdb_PyArg_ParseTupleAndKeywords (args, kwargs, "is", keywords,
+					&length, &string))
+    return -1;
+
+  if (length <= 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("Length must be greater than 0."));
+      return -1;
+    }
+
+  if (strlen (string) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("String must not be empty."));
+      return -1;
+    }
+
+  disasm_result_object *obj = (disasm_result_object *) self;
+  disasmpy_init_disassembler_result (obj, length, std::string (string));
+
+  return 0;
+}
+
+/* Implement memory_error_func callback for disassemble_info.  Extract the
+   underlying DisassembleInfo Python object, and set a memory error on
+   it.  */
+
+void
+gdbpy_disassembler::memory_error_func (int status, bfd_vma memaddr,
+				       struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  dis->m_memory_error_address.emplace (memaddr);
+}
+
+/* Wrapper of print_address.  */
+
+void
+gdbpy_disassembler::print_address_func (bfd_vma addr,
+					struct disassemble_info *info)
+{
+  gdbpy_disassembler *dis
+    = static_cast<gdbpy_disassembler *> (info->application_data);
+  print_address (dis->arch (), addr, (struct ui_file *) info->stream);
+}
+
+/* constructor.  */
+
+gdbpy_disassembler::gdbpy_disassembler (disasm_info_object *obj,
+					PyObject *memory_source)
+  : gdb_printing_disassembler (obj->gdbarch, &m_string_file,
+			       read_memory_func, memory_error_func,
+			       print_address_func),
+    m_disasm_info_object (obj),
+    m_memory_source (memory_source)
+{ /* Nothing.  */ }
+
+/* A wrapper around a reference to a Python DisassembleInfo object, which
+   ensures that the object is marked as invalid when we leave the enclosing
+   scope.
+
+   Each DisassembleInfo is created in gdbpy_print_insn, and is done with by
+   the time that function returns.  However, there's nothing to stop a user
+   caching a reference to the DisassembleInfo, and thus keeping the object
+   around.
+
+   We therefore have the notion of a DisassembleInfo becoming invalid, this
+   happens when gdbpy_print_insn returns.  This class is responsible for
+   marking the DisassembleInfo as invalid in its destructor.  */
+
+struct scoped_disasm_info_object
+{
+  /* Constructor.  */
+  scoped_disasm_info_object (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+			     disassemble_info *info)
+    : m_disasm_info (allocate_disasm_info_object ())
+  {
+    disasm_info_fill (m_disasm_info.get (), gdbarch, current_program_space,
+		      memaddr, info, nullptr);
+  }
+
+  /* Upon destruction mark m_diasm_info as invalid.  */
+  ~scoped_disasm_info_object ()
+  {
+    /* Invalidate the original DisassembleInfo object as well as any copies
+       that the user might have made.  */
+    for (disasm_info_object *obj = m_disasm_info.get ();
+	 obj != nullptr;
+	 obj = obj->next)
+      obj->gdb_info = nullptr;
+  }
+
+  /* Return a pointer to the underlying disasm_info_object instance.  */
+  disasm_info_object *
+  get () const
+  {
+    return m_disasm_info.get ();
+  }
+
+private:
+
+  /* Wrapper around the call to PyObject_New, this wrapper function can be
+     called from the constructor initialization list, while PyObject_New, a
+     macro, can't.  */
+  static disasm_info_object *
+  allocate_disasm_info_object ()
+  {
+    return (disasm_info_object *) PyObject_New (disasm_info_object,
+						&disasm_info_object_type);
+  }
+
+  /* A reference to a gdb.disassembler.DisassembleInfo object.  When this
+     containing instance goes out of scope this reference is released,
+     however, the user might be holding other references to the
+     DisassembleInfo object in Python code, so the underlying object might
+     not be deleted.  */
+  gdbpy_ref<disasm_info_object> m_disasm_info;
+};
+
+/* See python-internal.h.  */
+
+gdb::optional<int>
+gdbpy_print_insn (struct gdbarch *gdbarch, CORE_ADDR memaddr,
+		  disassemble_info *info)
+{
+  /* Early exit case.  This must be done as early as possible, and
+     definitely before we enter Python environment.  The
+     python_print_insn_enabled flag is set (from Python) only when the user
+     has installed one (or more) Python disassemblers.  So in the common
+     case (no custom disassembler installed) this flag will be false,
+     allowing for a quick return.  */
+  if (!gdb_python_initialized || !python_print_insn_enabled)
+    return {};
+
+  gdbpy_enter enter_py (get_current_arch (), current_language);
+
+  /* Import the gdb.disassembler module.  */
+  gdbpy_ref<> gdb_python_disassembler_module
+    (PyImport_ImportModule ("gdb.disassembler"));
+  if (gdb_python_disassembler_module == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Get the _print_insn attribute from the module, this should be the
+     function we are going to call to actually perform the disassembly.  */
+  gdbpy_ref<> hook
+    (PyObject_GetAttrString (gdb_python_disassembler_module.get (),
+			     "_print_insn"));
+  if (hook == nullptr)
+    {
+      gdbpy_print_stack ();
+      return {};
+    }
+
+  /* Create the new DisassembleInfo object we will pass into Python.  This
+     object will be marked as invalid when we leave this scope.  */
+  scoped_disasm_info_object scoped_disasm_info (gdbarch, memaddr, info);
+  disasm_info_object *disasm_info = scoped_disasm_info.get ();
+
+  /* Call into the registered disassembler to (possibly) perform the
+     disassembly.  */
+  PyObject *insn_disas_obj = (PyObject *) disasm_info;
+  gdbpy_ref<> result (PyObject_CallFunctionObjArgs (hook.get (),
+						    insn_disas_obj,
+						    nullptr));
+
+  if (result == nullptr)
+    {
+      /* The call into Python code resulted in an exception.  If this was a
+	 gdb.MemoryError, then we can figure out an address and call the
+	 disassemble_info::memory_error_func to report the error back to
+	 core GDB.  Any other exception type we report back to core GDB as
+	 an unknown error (return -1 without first calling the
+	 memory_error_func callback).  */
+
+      if (PyErr_ExceptionMatches (gdbpy_gdb_memory_error))
+	{
+	  /* A gdb.MemoryError might have an address attribute which
+	     contains the address at which the memory error occurred.  If
+	     this is the case then use this address, otherwise, fallback to
+	     just using the address of the instruction we were asked to
+	     disassemble.  */
+	  gdbpy_err_fetch err;
+	  PyErr_Clear ();
+
+	  CORE_ADDR addr;
+	  if (err.value () != nullptr
+	      && PyObject_HasAttrString (err.value ().get (), "address"))
+	    {
+	      PyObject *addr_obj
+		= PyObject_GetAttrString (err.value ().get (), "address");
+	      if (get_addr_from_python (addr_obj, &addr) < 0)
+		addr = disasm_info->address;
+	    }
+	  else
+	    addr = disasm_info->address;
+
+	  info->memory_error_func (-1, addr, info);
+	  return gdb::optional<int> (-1);
+	}
+      else if (PyErr_ExceptionMatches (gdbpy_gdberror_exc))
+	{
+	  gdbpy_err_fetch err;
+	  gdb::unique_xmalloc_ptr<char> msg = err.to_string ();
+
+	  info->fprintf_func (info->stream, "%s", msg.get ());
+	  return gdb::optional<int> (-1);
+	}
+      else
+	{
+	  gdbpy_print_stack ();
+	  return gdb::optional<int> (-1);
+	}
+
+    }
+  else if (result == Py_None)
+    {
+      /* A return value of None indicates that the Python code could not,
+	 or doesn't want to, disassemble this instruction.  Just return an
+	 empty result and core GDB will try to disassemble this for us.  */
+      return {};
+    }
+
+  /* Check the result is a DisassemblerResult (or a sub-class).  */
+  if (!PyObject_IsInstance (result.get (),
+			    (PyObject *) &disasm_result_object_type))
+    {
+      PyErr_SetString (PyExc_TypeError,
+		       _("Result is not a DisassemblerResult."));
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+
+  /* The call into Python neither raised an exception, or returned None.
+     Check to see if the result looks valid.  */
+  gdbpy_ref<> length_obj (PyObject_GetAttrString (result.get (), "length"));
+  if (length_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+
+  gdbpy_ref<> string_obj (PyObject_GetAttrString (result.get (), "string"));
+  if (string_obj == nullptr)
+    {
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+  if (!gdbpy_is_string (string_obj.get ()))
+    {
+      PyErr_SetString (PyExc_TypeError, _("String attribute is not a string."));
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+
+  gdb::unique_xmalloc_ptr<char> string
+    = gdbpy_obj_to_string (string_obj.get ());
+  if (string == nullptr)
+    {
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+
+  long length;
+  if (!gdb_py_int_as_long (length_obj.get (), &length))
+    {
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+
+  long max_insn_length = (gdbarch_max_insn_length_p (gdbarch) ?
+			  gdbarch_max_insn_length (gdbarch) : INT_MAX);
+  if (length <= 0)
+    {
+      PyErr_SetString
+	(PyExc_ValueError,
+	 _("Invalid length attribute: length must be greater than 0."));
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+  if (length > max_insn_length)
+    {
+      PyErr_Format
+	(PyExc_ValueError,
+	 _("Invalid length attribute: length %d greater than architecture maximum of %d"),
+	 length, max_insn_length);
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+
+  if (strlen (string.get ()) == 0)
+    {
+      PyErr_SetString (PyExc_ValueError,
+		       _("String attribute must not be empty."));
+      gdbpy_print_stack ();
+      return gdb::optional<int> (-1);
+    }
+
+  /* Print the disassembled instruction back to core GDB, and return the
+     length of the disassembled instruction.  */
+  info->fprintf_func (info->stream, "%s", string.get ());
+  return gdb::optional<int> (length);
+}
+
+/* The tp_dealloc callback for the DisassemblerResult type.  Takes care of
+   deallocating the content buffer.  */
+
+static void
+disasmpy_dealloc_result (PyObject *self)
+{
+  disasm_result_object *obj = (disasm_result_object *) self;
+  delete obj->content;
+  Py_TYPE (self)->tp_free (self);
+}
+
+/* The get/set attributes of the gdb.disassembler.DisassembleInfo type.  */
+
+static gdb_PyGetSetDef disasm_info_object_getset[] = {
+  { "address", disasmpy_info_address, nullptr,
+    "Start address of the instruction to disassemble.", nullptr },
+  { "architecture", disasmpy_info_architecture, nullptr,
+    "Architecture to disassemble in", nullptr },
+  { "progspace", disasmpy_info_progspace, nullptr,
+    "Program space to disassemble in", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* The methods of the gdb.disassembler.DisassembleInfo type.  */
+
+static PyMethodDef disasm_info_object_methods[] = {
+  { "read_memory", (PyCFunction) disasmpy_info_read_memory,
+    METH_VARARGS | METH_KEYWORDS,
+    "read_memory (LEN, OFFSET = 0) -> Octets[]\n\
+Read LEN octets for the instruction to disassemble." },
+  { "is_valid", disasmpy_info_is_valid, METH_NOARGS,
+    "is_valid () -> Boolean.\n\
+Return true if this DisassembleInfo is valid, false if not." },
+  {nullptr}  /* Sentinel */
+};
+
+/* The get/set attributes of the gdb.disassembler.DisassemblerResult type.  */
+
+static gdb_PyGetSetDef disasm_result_object_getset[] = {
+  { "length", disasmpy_result_length, nullptr,
+    "Length of the disassembled instruction.", nullptr },
+  { "string", disasmpy_result_string, nullptr,
+    "String representing the disassembled instruction.", nullptr },
+  { nullptr }   /* Sentinel */
+};
+
+/* These are the methods we add into the _gdb.disassembler module, which
+   are then imported into the gdb.disassembler module.  These are global
+   functions that support performing disassembly.  */
+
+PyMethodDef python_disassembler_methods[] =
+{
+  { "builtin_disassemble", (PyCFunction) disasmpy_builtin_disassemble,
+    METH_VARARGS | METH_KEYWORDS,
+    "builtin_disassemble (INFO, MEMORY_SOURCE = None) -> None\n\
+Disassemble using GDB's builtin disassembler.  INFO is an instance of\n\
+gdb.disassembler.DisassembleInfo.  The MEMORY_SOURCE, if not None, should\n\
+be an object with the read_memory method." },
+  { "_set_enabled", (PyCFunction) disasmpy_set_enabled,
+    METH_VARARGS | METH_KEYWORDS,
+    "_set_enabled (STATE) -> None\n\
+Set whether GDB should call into the Python _print_insn code or not." },
+  {nullptr, nullptr, 0, nullptr}
+};
+
+/* Structure to define the _gdb.disassembler module.  */
+
+static struct PyModuleDef python_disassembler_module_def =
+{
+  PyModuleDef_HEAD_INIT,
+  "_gdb.disassembler",
+  nullptr,
+  -1,
+  python_disassembler_methods,
+  nullptr,
+  nullptr,
+  nullptr,
+  nullptr
+};
+
+/* Called to initialize the Python structures in this file.  */
+
+int
+gdbpy_initialize_disasm ()
+{
+  /* Create the _gdb.disassembler module, and add it to the _gdb module.  */
+
+  PyObject *gdb_disassembler_module;
+  gdb_disassembler_module = PyModule_Create (&python_disassembler_module_def);
+  if (gdb_disassembler_module == nullptr)
+    return -1;
+  PyModule_AddObject(gdb_module, "disassembler", gdb_disassembler_module);
+
+  /* This is needed so that 'import _gdb.disassembler' will work.  */
+  PyObject *dict = PyImport_GetModuleDict ();
+  PyDict_SetItemString (dict, "_gdb.disassembler", gdb_disassembler_module);
+
+  disasm_info_object_type.tp_new = PyType_GenericNew;
+  if (PyType_Ready (&disasm_info_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassembleInfo",
+			      (PyObject *) &disasm_info_object_type) < 0)
+    return -1;
+
+  disasm_result_object_type.tp_new = PyType_GenericNew;
+  if (PyType_Ready (&disasm_result_object_type) < 0)
+    return -1;
+
+  if (gdb_pymodule_addobject (gdb_disassembler_module, "DisassemblerResult",
+			      (PyObject *) &disasm_result_object_type) < 0)
+    return -1;
+
+  return 0;
+}
+
+/* Describe the gdb.disassembler.DisassembleInfo type.  */
+
+PyTypeObject disasm_info_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassembleInfo",		/*tp_name*/
+  sizeof (disasm_info_object),			/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  disasm_info_dealloc,				/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,	/*tp_flags*/
+  "GDB instruction disassembler object",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  disasm_info_object_methods,			/* tp_methods */
+  0,						/* tp_members */
+  disasm_info_object_getset,			/* tp_getset */
+  0,						/* tp_base */
+  0,						/* tp_dict */
+  0,						/* tp_descr_get */
+  0,						/* tp_descr_set */
+  0,						/* tp_dictoffset */
+  disasm_info_init,				/* tp_init */
+  0,						/* tp_alloc */
+};
+
+/* Describe the gdb.disassembler.DisassemblerResult type.  */
+
+PyTypeObject disasm_result_object_type = {
+  PyVarObject_HEAD_INIT (nullptr, 0)
+  "gdb.disassembler.DisassemblerResult",	/*tp_name*/
+  sizeof (disasm_result_object),		/*tp_basicsize*/
+  0,						/*tp_itemsize*/
+  disasmpy_dealloc_result,			/*tp_dealloc*/
+  0,						/*tp_print*/
+  0,						/*tp_getattr*/
+  0,						/*tp_setattr*/
+  0,						/*tp_compare*/
+  0,						/*tp_repr*/
+  0,						/*tp_as_number*/
+  0,						/*tp_as_sequence*/
+  0,						/*tp_as_mapping*/
+  0,						/*tp_hash */
+  0,						/*tp_call*/
+  0,						/*tp_str*/
+  0,						/*tp_getattro*/
+  0,						/*tp_setattro*/
+  0,						/*tp_as_buffer*/
+  Py_TPFLAGS_DEFAULT | Py_TPFLAGS_BASETYPE,	/*tp_flags*/
+  "GDB object, representing a disassembler result",	/* tp_doc */
+  0,						/* tp_traverse */
+  0,						/* tp_clear */
+  0,						/* tp_richcompare */
+  0,						/* tp_weaklistoffset */
+  0,						/* tp_iter */
+  0,						/* tp_iternext */
+  0,						/* tp_methods */
+  0,						/* tp_members */
+  disasm_result_object_getset,			/* tp_getset */
+  0,						/* tp_base */
+  0,						/* tp_dict */
+  0,						/* tp_descr_get */
+  0,						/* tp_descr_set */
+  0,						/* tp_dictoffset */
+  disasmpy_result_init,				/* tp_init */
+  0,						/* tp_alloc */
+};
diff --git a/gdb/python/python-internal.h b/gdb/python/python-internal.h
index da2e79101a6..5ff9989af83 100644
--- a/gdb/python/python-internal.h
+++ b/gdb/python/python-internal.h
@@ -540,6 +540,8 @@ int gdbpy_initialize_connection ()
 int gdbpy_initialize_micommands (void)
   CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 void gdbpy_finalize_micommands ();
+int gdbpy_initialize_disasm ()
+  CPYCHECKER_NEGATIVE_RESULT_SETS_EXCEPTION;
 
 /* A wrapper for PyErr_Fetch that handles reference counting for the
    caller.  */
@@ -587,6 +589,13 @@ class gdbpy_err_fetch
     return PyErr_GivenExceptionMatches (m_error_type.get (), type);
   }
 
+  /* Return a new reference to the exception value object.  */
+
+  gdbpy_ref<> value ()
+  {
+    return m_error_value;
+  }
+
 private:
 
   gdbpy_ref<> m_error_type, m_error_value, m_error_traceback;
@@ -840,4 +849,18 @@ extern bool gdbpy_is_progspace (PyObject *obj);
 extern gdb::unique_xmalloc_ptr<char> gdbpy_fix_doc_string_indentation
   (gdb::unique_xmalloc_ptr<char> doc);
 
+/* Implement the 'print_insn' hook for Python.  Disassemble an instruction
+   whose address is ADDRESS for architecture GDBARCH.  The bytes of the
+   instruction should be read with INFO->read_memory_func as the
+   instruction being disassembled might actually be in a buffer.
+
+   Used INFO->fprintf_func to print the results of the disassembly, and
+   return the length of the instruction in octets.
+
+   If no instruction can be disassembled then return an empty value.  */
+
+extern gdb::optional<int> gdbpy_print_insn (struct gdbarch *gdbarch,
+					    CORE_ADDR address,
+					    disassemble_info *info);
+
 #endif /* PYTHON_PYTHON_INTERNAL_H */
diff --git a/gdb/python/python.c b/gdb/python/python.c
index 97de5f5cee5..079c260fc7f 100644
--- a/gdb/python/python.c
+++ b/gdb/python/python.c
@@ -167,7 +167,7 @@ static const struct extension_language_ops python_extension_ops =
 
   gdbpy_colorize_disasm,
 
-  NULL, /* gdbpy_print_insn, */
+  gdbpy_print_insn,
 };
 
 #endif /* HAVE_PYTHON */
@@ -2053,6 +2053,7 @@ do_start_initialization ()
 
   if (gdbpy_initialize_auto_load () < 0
       || gdbpy_initialize_values () < 0
+      || gdbpy_initialize_disasm () < 0
       || gdbpy_initialize_frames () < 0
       || gdbpy_initialize_commands () < 0
       || gdbpy_initialize_instruction () < 0
diff --git a/gdb/testsuite/gdb.python/py-disasm.c b/gdb/testsuite/gdb.python/py-disasm.c
new file mode 100644
index 00000000000..ee0bb157f4d
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.c
@@ -0,0 +1,25 @@
+/* This test program is part of GDB, the GNU debugger.
+
+   Copyright 2021-2022 Free Software Foundation, Inc.
+
+   This program is free software; you can redistribute it and/or modify
+   it under the terms of the GNU General Public License as published by
+   the Free Software Foundation; either version 3 of the License, or
+   (at your option) any later version.
+
+   This program is distributed in the hope that it will be useful,
+   but WITHOUT ANY WARRANTY; without even the implied warranty of
+   MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+   GNU General Public License for more details.
+
+   You should have received a copy of the GNU General Public License
+   along with this program.  If not, see <http://www.gnu.org/licenses/>.  */
+
+int
+main ()
+{
+  asm ("nop");
+  asm ("nop");	/* Break here.  */
+  asm ("nop");
+  return 0;
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.exp b/gdb/testsuite/gdb.python/py-disasm.exp
new file mode 100644
index 00000000000..1b9cd4465ac
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.exp
@@ -0,0 +1,209 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+# This file is part of the GDB testsuite.  It validates the Python
+# disassembler API.
+
+load_lib gdb-python.exp
+
+standard_testfile
+
+if { [prepare_for_testing "failed to prepare" ${testfile} ${srcfile} "debug"] } {
+    return -1
+}
+
+# Skip all tests if Python scripting is not enabled.
+if { [skip_python_tests] } { continue }
+
+if ![runto_main] then {
+    fail "can't run to main"
+    return 0
+}
+
+set pyfile [gdb_remote_download host ${srcdir}/${subdir}/${testfile}.py]
+
+gdb_test "source ${pyfile}" "Python script imported" \
+         "import python scripts"
+
+gdb_breakpoint [gdb_get_line_number "Break here."]
+gdb_continue_to_breakpoint "Break here."
+
+set curr_pc [get_valueof "/x" "\$pc" "*unknown*"]
+
+gdb_test_no_output "python current_pc = ${curr_pc}"
+
+# The current pc will be something like 0x1234 with no leading zeros.
+# However, in the disassembler output addresses are padded with zeros.
+# This substitution changes 0x1234 to 0x0*1234, which can then be used
+# as a regexp in the disassembler output matching.
+set curr_pc_pattern [string replace ${curr_pc} 0 1 "0x0*"]
+
+# Grab the name of the current architecture, this is used in the tests
+# patterns below.
+set curr_arch [get_python_valueof "gdb.selected_inferior().architecture().name()" "*unknown*"]
+
+# Helper proc that removes all registered disassemblers.
+proc py_remove_all_disassemblers {} {
+    gdb_test_no_output "python remove_all_python_disassemblers()"
+}
+
+# A list of test plans.  Each plan is a list of two elements, the
+# first element is the name of a class in py-disasm.py, this is a
+# disassembler class.  The second element is a pattern that should be
+# matched in the disassembler output.
+#
+# Each different disassembler tests some different feature of the
+# Python disassembler API.
+set unknown_error_pattern "unknown disassembler error \\(error = -1\\)"
+set addr_pattern "\r\n=> ${curr_pc_pattern} <\[^>\]+>:\\s+"
+set base_pattern "${addr_pattern}nop"
+set test_plans \
+    [list \
+	 [list "" "${base_pattern}\r\n.*"] \
+	 [list "GlobalNullDisassembler" "${base_pattern}\r\n.*"] \
+	 [list "GlobalPreInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalPostInfoDisassembler" "${base_pattern}\\s+## ad = $hex, ar = ${curr_arch}\r\n.*"] \
+	 [list "GlobalReadDisassembler" "${base_pattern}\\s+## bytes =( $hex)+\r\n.*"] \
+	 [list "GlobalAddrDisassembler" "${base_pattern}\\s+## addr = ${curr_pc_pattern} <\[^>\]+>\r\n.*"] \
+	 [list "GdbErrorEarlyDisassembler" "${addr_pattern}GdbError instead of a result\r\n${unknown_error_pattern}"] \
+	 [list "RuntimeErrorEarlyDisassembler" "${addr_pattern}Python Exception <class 'RuntimeError'>: RuntimeError instead of a result\r\n\r\n${unknown_error_pattern}"] \
+	 [list "GdbErrorLateDisassembler" "${addr_pattern}GdbError after builtin disassembler\r\n${unknown_error_pattern}"] \
+	 [list "RuntimeErrorLateDisassembler" "${addr_pattern}Python Exception <class 'RuntimeError'>: RuntimeError after builtin disassembler\r\n\r\n${unknown_error_pattern}"] \
+	 [list "MemoryErrorEarlyDisassembler" "${base_pattern}\\s+## AFTER ERROR\r\n.*"] \
+	 [list "MemoryErrorLateDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "RethrowMemoryErrorDisassembler" "${addr_pattern}Cannot access memory at address $hex"] \
+	 [list "ReadMemoryMemoryErrorDisassembler" "${addr_pattern}Cannot access memory at address ${curr_pc_pattern}"] \
+	 [list "ReadMemoryGdbErrorDisassembler" "${addr_pattern}read_memory raised GdbError\r\n${unknown_error_pattern}"] \
+	 [list "ReadMemoryRuntimeErrorDisassembler" "${addr_pattern}Python Exception <class 'RuntimeError'>: read_memory raised RuntimeError\r\n\r\n${unknown_error_pattern}"] \
+	 [list "ReadMemoryCaughtMemoryErrorDisassembler" "${addr_pattern}nop\r\n.*"] \
+	 [list "ReadMemoryCaughtGdbErrorDisassembler" "${addr_pattern}nop\r\n.*"] \
+	 [list "ReadMemoryCaughtRuntimeErrorDisassembler" "${addr_pattern}nop\r\n.*"] \
+	 [list "MemorySourceNotABufferDisassembler" "${addr_pattern}Python Exception <class 'TypeError'>: Result from read_memory is not a buffer\r\n\r\n${unknown_error_pattern}"] \
+	 [list "MemorySourceBufferTooLongDisassembler" "${addr_pattern}Python Exception <class 'ValueError'>: Buffer returned from read_memory is sized $decimal instead of the expected $decimal\r\n\r\n${unknown_error_pattern}"] \
+	 [list "ResultOfWrongType" "${addr_pattern}Python Exception <class 'TypeError'>: Result is not a DisassemblerResult.\r\n.*"] \
+	 [list "ResultWithInvalidLength" "${addr_pattern}Python Exception <class 'ValueError'>: Invalid length attribute: length must be greater than 0.\r\n.*"] \
+	 [list "ResultWithInvalidString" "${addr_pattern}Python Exception <class 'ValueError'>: String attribute must not be empty.\r\n.*"]]
+
+# Now execute each test plan.
+foreach plan $test_plans {
+    set global_disassembler_name [lindex $plan 0]
+    set expected_pattern [lindex $plan 1]
+
+    with_test_prefix "global_disassembler=${global_disassembler_name}" {
+	# Remove all existing disassemblers.
+	py_remove_all_disassemblers
+
+	# If we have a disassembler to load, do it now.
+	if { $global_disassembler_name != "" } {
+	    gdb_test_no_output "python add_global_disassembler($global_disassembler_name)"
+	}
+
+	# Disassemble main, and check the disassembler output.
+	gdb_test "disassemble main" $expected_pattern
+    }
+}
+
+# Check some errors relating to DisassemblerResult creation.
+with_test_prefix "DisassemblerResult errors" {
+    gdb_test "python gdb.disassembler.DisassemblerResult(0, 'abc')" \
+	[multi_line \
+	     "ValueError: Length must be greater than 0." \
+	     "Error while executing Python code."]
+    gdb_test "python gdb.disassembler.DisassemblerResult(-1, 'abc')" \
+	[multi_line \
+	     "ValueError: Length must be greater than 0." \
+	     "Error while executing Python code."]
+    gdb_test "python gdb.disassembler.DisassemblerResult(1, '')" \
+	[multi_line \
+	     "ValueError: String must not be empty." \
+	     "Error while executing Python code."]
+}
+
+# Check that the architecture specific disassemblers can override the
+# global disassembler.
+#
+# First, register a global disassembler, and check it is in place.
+with_test_prefix "GLOBAL tagging disassembler" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"GLOBAL\"), None)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Now register an architecture specific disassembler, and check it
+# overrides the global disassembler.
+with_test_prefix "LOCAL tagging disassembler" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(TaggingDisassembler(\"LOCAL\"), \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = LOCAL\r\n.*"
+}
+
+# Now remove the architecture specific disassembler, and check that
+# the global disassembler kicks back in.
+with_test_prefix "GLOBAL tagging disassembler again" {
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(None, \"${curr_arch}\")"
+    gdb_test "disassemble main" "${base_pattern}\\s+## tag = GLOBAL\r\n.*"
+}
+
+# Check that a DisassembleInfo becomes invalid after the call into the
+# disassembler.
+with_test_prefix "DisassembleInfo becomes invalid" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python add_global_disassembler(GlobalCachingDisassembler)"
+    gdb_test "disassemble main" "${base_pattern}\\s+## CACHED\r\n.*"
+    gdb_test "python GlobalCachingDisassembler.check()" "PASS"
+}
+
+# Test the memory source aspect of the builtin disassembler.
+with_test_prefix "memory source api" {
+    py_remove_all_disassemblers
+    gdb_test_no_output "python analyzing_disassembler = add_global_disassembler(AnalyzingDisassembler)"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*"
+    gdb_test "python analyzing_disassembler.find_replacement_candidate()" \
+	"Replace from $hex to $hex with NOP"
+    gdb_test "disassemble main" "${base_pattern}\r\n.*" \
+	"second disassembler pass"
+    gdb_test "python analyzing_disassembler.check()" \
+	"PASS"
+}
+
+# Test the 'maint info python-disassemblers command.
+with_test_prefix "maint info python-disassemblers" {
+    py_remove_all_disassemblers
+    gdb_test "maint info python-disassemblers" "No Python disassemblers registered\\." \
+	"list disassemblers, none registered"
+    gdb_test_no_output "python disasm = add_global_disassembler(BuiltinDisassembler)"
+    gdb_test "maint info python-disassemblers" \
+	[multi_line \
+	     "Architecture\\s+Disassember Name" \
+	     "GLOBAL\\s+BuiltinDisassembler\\s+\\(Matches current architecture\\)"] \
+	"list disassemblers, single global disassembler"
+    gdb_test_no_output "python arch = gdb.selected_inferior().architecture().name()"
+    gdb_test_no_output "python gdb.disassembler.register_disassembler(disasm, arch)"
+    gdb_test "maint info python-disassemblers" \
+	[multi_line \
+	     "Architecture\\s+Disassember Name" \
+	     "\[^\r\n\]+BuiltinDisassembler\\s+\\(Matches current architecture\\)" \
+	     "GLOBAL\\s+BuiltinDisassembler"] \
+	"list disassemblers, multiple disassemblers registered"
+}
+
+# Check the attempt to create a "new" DisassembleInfo object fails.
+with_test_prefix "Bad DisassembleInfo creation" {
+    gdb_test_no_output "python my_info = InvalidDisassembleInfo()"
+    gdb_test "python print(my_info.is_valid())" "True"
+    gdb_test "python gdb.disassembler.builtin_disassemble(my_info)" \
+	[multi_line \
+	     "RuntimeError: DisassembleInfo is no longer valid\\." \
+	     "Error while executing Python code\\."]
+}
diff --git a/gdb/testsuite/gdb.python/py-disasm.py b/gdb/testsuite/gdb.python/py-disasm.py
new file mode 100644
index 00000000000..ff7ffdb97d9
--- /dev/null
+++ b/gdb/testsuite/gdb.python/py-disasm.py
@@ -0,0 +1,712 @@
+# Copyright (C) 2021-2022 Free Software Foundation, Inc.
+
+# This program is free software; you can redistribute it and/or modify
+# it under the terms of the GNU General Public License as published by
+# the Free Software Foundation; either version 3 of the License, or
+# (at your option) any later version.
+#
+# This program is distributed in the hope that it will be useful,
+# but WITHOUT ANY WARRANTY; without even the implied warranty of
+# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the
+# GNU General Public License for more details.
+#
+# You should have received a copy of the GNU General Public License
+# along with this program.  If not, see <http://www.gnu.org/licenses/>.
+
+import gdb
+import gdb.disassembler
+import struct
+import sys
+
+from gdb.disassembler import Disassembler, DisassemblerResult
+
+# A global, holds the program-counter address at which we should
+# perform the extra disassembly that this script provides.
+current_pc = None
+
+
+# Remove all currently registered disassemblers.
+def remove_all_python_disassemblers():
+    for a in gdb.architecture_names():
+        gdb.disassembler.register_disassembler(None, a)
+    gdb.disassembler.register_disassembler(None, None)
+
+
+class TestDisassembler(Disassembler):
+    """A base class for disassemblers within this script to inherit from.
+    Implements the __call__ method and ensures we only do any
+    disassembly wrapping for the global CURRENT_PC."""
+
+    def __init__(self):
+        global current_pc
+
+        super().__init__("TestDisassembler")
+        self.__info = None
+        if current_pc == None:
+            raise gdb.GdbError("no current_pc set")
+
+    def __call__(self, info):
+        global current_pc
+
+        if info.address != current_pc:
+            return None
+        self.__info = info
+        return self.disassemble(info)
+
+    def get_info(self):
+        return self.__info
+
+    def disassemble(self, info):
+        raise NotImplementedError("override the disassemble method")
+
+
+class GlobalPreInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo before disassembly has occurred."""
+
+    def disassemble(self, info):
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalPostInfoDisassembler(TestDisassembler):
+    """Check the attributes of DisassembleInfo after disassembly has occurred."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        ad = info.address
+        ar = info.architecture
+
+        if ad != current_pc:
+            raise gdb.GdbError("invalid address")
+
+        if not isinstance(ar, gdb.Architecture):
+            raise gdb.GdbError("invalid architecture type")
+
+        text = result.string + "\t## ad = 0x%x, ar = %s" % (ad, ar.name())
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalReadDisassembler(TestDisassembler):
+    """Check the DisassembleInfo.read_memory method.  Calls the builtin
+    disassembler, then reads all of the bytes of this instruction, and
+    adds them as a comment to the disassembler output."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        len = result.length
+        str = ""
+        for o in range(len):
+            if str != "":
+                str += " "
+            v = bytes(info.read_memory(1, o))[0]
+            if sys.version_info[0] < 3:
+                v = struct.unpack("<B", v)
+            str += "0x%02x" % v
+        text = result.string + "\t## bytes = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalAddrDisassembler(TestDisassembler):
+    """Check the gdb.format_address method."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        arch = info.architecture
+        addr = info.address
+        program_space = info.progspace
+        str = gdb.format_address(addr, program_space, arch)
+        text = result.string + "\t## addr = %s" % str
+        return DisassemblerResult(result.length, text)
+
+
+class GdbErrorEarlyDisassembler(TestDisassembler):
+    """Raise a GdbError instead of performing any disassembly."""
+
+    def disassemble(self, info):
+        raise gdb.GdbError("GdbError instead of a result")
+
+
+class RuntimeErrorEarlyDisassembler(TestDisassembler):
+    """Raise a RuntimeError instead of performing any disassembly."""
+
+    def disassemble(self, info):
+        raise RuntimeError("RuntimeError instead of a result")
+
+
+class GdbErrorLateDisassembler(TestDisassembler):
+    """Raise a GdbError after calling the builtin disassembler."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        raise gdb.GdbError("GdbError after builtin disassembler")
+
+
+class RuntimeErrorLateDisassembler(TestDisassembler):
+    """Raise a RuntimeError after calling the builtin disassembler."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        raise RuntimeError("RuntimeError after builtin disassembler")
+
+
+class MemoryErrorEarlyDisassembler(TestDisassembler):
+    """Throw a memory error, ignore the error and disassemble."""
+
+    def disassemble(self, info):
+        tag = "## FAIL"
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError:
+            tag = "## AFTER ERROR"
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t" + tag
+        return DisassemblerResult(result.length, text)
+
+
+class MemoryErrorLateDisassembler(TestDisassembler):
+    """Throw a memory error after calling the builtin disassembler, but
+    before we return a result."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        # The following read will throw an error.
+        info.read_memory(1, -info.address + 2)
+        return DisassemblerResult(1, "BAD")
+
+
+class RethrowMemoryErrorDisassembler(TestDisassembler):
+    """Catch and rethrow a memory error."""
+
+    def disassemble(self, info):
+        try:
+            info.read_memory(1, -info.address + 2)
+        except gdb.MemoryError as e:
+            raise gdb.MemoryError("cannot read code at address 0x2")
+        return DisassemblerResult(1, "BAD")
+
+
+class ResultOfWrongType(TestDisassembler):
+    """Return something that is not a DisassemblerResult from disassemble method"""
+
+    class Blah:
+        def __init__(self, length, string):
+            self.length = length
+            self.string = string
+
+    def disassemble(self, info):
+        return self.Blah(1, "ABC")
+
+
+class ResultWrapper(gdb.disassembler.DisassemblerResult):
+    def __init__(self, length, string, length_x=None, string_x=None):
+        super().__init__(length, string)
+        if length_x is None:
+            self.__length = length
+        else:
+            self.__length = length_x
+        if string_x is None:
+            self.__string = string
+        else:
+            self.__string = string_x
+
+    @property
+    def length(self):
+        return self.__length
+
+    @property
+    def string(self):
+        return self.__string
+
+
+class ResultWithInvalidLength(TestDisassembler):
+    """Return a result object with an invalid length."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        return ResultWrapper(result.length, result.string, 0)
+
+
+class ResultWithInvalidString(TestDisassembler):
+    """Return a result object with an empty string."""
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        return ResultWrapper(result.length, result.string, None, "")
+
+
+class TaggingDisassembler(TestDisassembler):
+    """A simple disassembler that just tags the output."""
+
+    def __init__(self, tag):
+        super().__init__()
+        self._tag = tag
+
+    def disassemble(self, info):
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## tag = %s" % self._tag
+        return DisassemblerResult(result.length, text)
+
+
+class GlobalCachingDisassembler(TestDisassembler):
+    """A disassembler that caches the DisassembleInfo that is passed in,
+    as well as a copy of the original DisassembleInfo.
+
+    Once the call into the disassembler is complete then the
+    DisassembleInfo objects become invalid, and any calls into them
+    should trigger an exception."""
+
+    # This is where we cache the DisassembleInfo objects.
+    cached_insn_disas = []
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+    def disassemble(self, info):
+        """Disassemble the instruction, add a CACHED comment to the output,
+        and cache the DisassembleInfo so that it is not garbage collected."""
+        GlobalCachingDisassembler.cached_insn_disas.append(info)
+        GlobalCachingDisassembler.cached_insn_disas.append(self.MyInfo(info))
+        result = gdb.disassembler.builtin_disassemble(info)
+        text = result.string + "\t## CACHED"
+        return DisassemblerResult(result.length, text)
+
+    @staticmethod
+    def check():
+        """Check that all of the methods on the cached DisassembleInfo trigger an
+        exception."""
+        for info in GlobalCachingDisassembler.cached_insn_disas:
+            assert isinstance(info, gdb.disassembler.DisassembleInfo)
+            assert not info.is_valid()
+            try:
+                val = info.address
+                raise gdb.GdbError("DisassembleInfo.address is still valid")
+            except RuntimeError as e:
+                assert str(e) == "DisassembleInfo is no longer valid."
+            except:
+                raise gdb.GdbError(
+                    "DisassembleInfo.address raised an unexpected exception"
+                )
+
+            try:
+                val = info.architecture
+                raise gdb.GdbError("DisassembleInfo.architecture is still valid")
+            except RuntimeError as e:
+                assert str(e) == "DisassembleInfo is no longer valid."
+            except:
+                raise gdb.GdbError(
+                    "DisassembleInfo.architecture raised an unexpected exception"
+                )
+
+            try:
+                val = info.read_memory(1, 0)
+                raise gdb.GdbError("DisassembleInfo.read is still valid")
+            except RuntimeError as e:
+                assert str(e) == "DisassembleInfo is no longer valid."
+            except:
+                raise gdb.GdbError(
+                    "DisassembleInfo.read raised an unexpected exception"
+                )
+
+        print("PASS")
+
+
+class GlobalNullDisassembler(TestDisassembler):
+    """A disassembler that does not change the output at all."""
+
+    def disassemble(self, info):
+        pass
+
+
+class ReadMemoryMemoryErrorDisassembler(TestDisassembler):
+    """Raise a MemoryError exception from the DisassembleInfo.read_memory
+    method."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            # Throw a memory error with a specific address.  We don't
+            # expect this address to show up in the output though.
+            raise gdb.MemoryError(0x1234)
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class ReadMemoryGdbErrorDisassembler(TestDisassembler):
+    """Raise a GdbError exception from the DisassembleInfo.read_memory
+    method."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            raise gdb.GdbError("read_memory raised GdbError")
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class ReadMemoryRuntimeErrorDisassembler(TestDisassembler):
+    """Raise a RuntimeError exception from the DisassembleInfo.read_memory
+    method."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            raise RuntimeError("read_memory raised RuntimeError")
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class ReadMemoryCaughtMemoryErrorDisassembler(TestDisassembler):
+    """Raise a MemoryError exception from the DisassembleInfo.read_memory
+    method, catch this in the outer disassembler."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            raise gdb.MemoryError(0x1234)
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        try:
+            return gdb.disassembler.builtin_disassemble(info)
+        except gdb.MemoryError:
+            return None
+
+
+class ReadMemoryCaughtGdbErrorDisassembler(TestDisassembler):
+    """Raise a GdbError exception from the DisassembleInfo.read_memory
+    method, catch this in the outer disassembler."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            raise gdb.GdbError("exception message")
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        try:
+            return gdb.disassembler.builtin_disassemble(info)
+        except gdb.GdbError as e:
+            if e.args[0] == "exception message":
+                return None
+            raise e
+
+
+class ReadMemoryCaughtRuntimeErrorDisassembler(TestDisassembler):
+    """Raise a RuntimeError exception from the DisassembleInfo.read_memory
+    method, catch this in the outer disassembler."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            raise RuntimeError("exception message")
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        try:
+            return gdb.disassembler.builtin_disassemble(info)
+        except RuntimeError as e:
+            if e.args[0] == "exception message":
+                return None
+            raise e
+
+
+class MemorySourceNotABufferDisassembler(TestDisassembler):
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            return 1234
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class MemorySourceBufferTooLongDisassembler(TestDisassembler):
+    """The read memory returns too many bytes."""
+
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        def __init__(self, info):
+            super().__init__(info)
+
+        def read_memory(self, length, offset):
+            buffer = super().read_memory(length, offset)
+            # Create a new memory view made by duplicating BUFFER.  This
+            # will trigger an error as GDB expects a buffer of exactly
+            # LENGTH to be returned, while this will return a buffer of
+            # 2*LENGTH.
+            return memoryview(
+                bytes([int.from_bytes(x, "little") for x in (list(buffer[0:]) * 2)])
+            )
+
+    def disassemble(self, info):
+        info = self.MyInfo(info)
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class BuiltinDisassembler(Disassembler):
+    """Just calls the builtin disassembler."""
+
+    def __init__(self):
+        super().__init__("BuiltinDisassembler")
+
+    def __call__(self, info):
+        return gdb.disassembler.builtin_disassemble(info)
+
+
+class AnalyzingDisassembler(Disassembler):
+    class MyInfo(gdb.disassembler.DisassembleInfo):
+        """Wrapper around builtin DisassembleInfo type that overrides the
+        read_memory method."""
+
+        def __init__(self, info, start, end, nop_bytes):
+            """INFO is the DisassembleInfo we are wrapping.  START and END are
+            addresses, and NOP_BYTES should be a memoryview object.
+
+            The length (END - START) should be the same as the length
+            of NOP_BYTES.
+
+            Any memory read requests outside the START->END range are
+            serviced normally, but any attempt to read within the
+            START->END range will return content from NOP_BYTES."""
+            super().__init__(info)
+            self._start = start
+            self._end = end
+            self._nop_bytes = nop_bytes
+
+        def _read_replacement(self, length, offset):
+            """Return a slice of the buffer representing the replacement nop
+            instructions."""
+
+            assert self._nop_bytes is not None
+            rb = self._nop_bytes
+
+            # If this request is outside of a nop instruction then we don't know
+            # what to do, so just raise a memory error.
+            if offset >= len(rb) or (offset + length) > len(rb):
+                raise gdb.MemoryError("invalid length and offset combination")
+
+            # Return only the slice of the nop instruction as requested.
+            s = offset
+            e = offset + length
+            return rb[s:e]
+
+        def read_memory(self, length, offset=0):
+            """Callback used by the builtin disassembler to read the contents of
+            memory."""
+
+            # If this request is within the region we are replacing with 'nop'
+            # instructions, then call the helper function to perform that
+            # replacement.
+            if self._start is not None:
+                assert self._end is not None
+                if self.address >= self._start and self.address < self._end:
+                    return self._read_replacement(length, offset)
+
+            # Otherwise, we just forward this request to the default read memory
+            # implementation.
+            return super().read_memory(length, offset)
+
+    def __init__(self):
+        """Constructor."""
+        super().__init__("AnalyzingDisassembler")
+
+        # Details about the instructions found during the first disassembler
+        # pass.
+        self._pass_1_length = []
+        self._pass_1_insn = []
+        self._pass_1_address = []
+
+        # The start and end address for the instruction we will replace with
+        # one or more 'nop' instructions during pass two.
+        self._start = None
+        self._end = None
+
+        # The index in the _pass_1_* lists for where the nop instruction can
+        # be found, also, the buffer of bytes that make up a nop instruction.
+        self._nop_index = None
+        self._nop_bytes = None
+
+        # A flag that indicates if we are in the first or second pass of
+        # this disassembler test.
+        self._first_pass = True
+
+        # The disassembled instructions collected during the second pass.
+        self._pass_2_insn = []
+
+        # A copy of _pass_1_insn that has been modified to include the extra
+        # 'nop' instructions we plan to insert during the second pass.  This
+        # is then checked against _pass_2_insn after the second disassembler
+        # pass has completed.
+        self._check = []
+
+    def __call__(self, info):
+        """Called to perform the disassembly."""
+
+        # Override the info object, this provides access to our
+        # read_memory function.
+        info = self.MyInfo(info, self._start, self._end, self._nop_bytes)
+        result = gdb.disassembler.builtin_disassemble(info)
+
+        # Record some informaiton about the first 'nop' instruction we find.
+        if self._nop_index is None and result.string == "nop":
+            self._nop_index = len(self._pass_1_length)
+            # The offset in the following read_memory call defaults to 0.
+            print("APB: Reading nop bytes")
+            self._nop_bytes = info.read_memory(result.length)
+
+        # Record information about each instruction that is disassembled.
+        # This test is performed in two passes, and we need different
+        # information in each pass.
+        if self._first_pass:
+            self._pass_1_length.append(result.length)
+            self._pass_1_insn.append(result.string)
+            self._pass_1_address.append(info.address)
+        else:
+            self._pass_2_insn.append(result.string)
+
+        return result
+
+    def find_replacement_candidate(self):
+        """Call this after the first disassembly pass.  This identifies a suitable
+        instruction to replace with 'nop' instruction(s)."""
+
+        if self._nop_index is None:
+            raise gdb.GdbError("no nop was found")
+
+        nop_idx = self._nop_index
+        nop_length = self._pass_1_length[nop_idx]
+
+        # First we look for an instruction that is larger than a nop
+        # instruction, but whose length is an exact multiple of the nop
+        # instruction's length.
+        replace_idx = None
+        for idx in range(len(self._pass_1_length)):
+            if (
+                idx > 0
+                and idx != nop_idx
+                and self._pass_1_insn[idx] != "nop"
+                and self._pass_1_length[idx] > self._pass_1_length[nop_idx]
+                and self._pass_1_length[idx] % self._pass_1_length[nop_idx] == 0
+            ):
+                replace_idx = idx
+                break
+
+        # If we still don't have a replacement candidate, then search again,
+        # this time looking for an instruciton that is the same length as a
+        # nop instruction.
+        if replace_idx is None:
+            for idx in range(len(self._pass_1_length)):
+                if (
+                    idx > 0
+                    and idx != nop_idx
+                    and self._pass_1_insn[idx] != "nop"
+                    and self._pass_1_length[idx] == self._pass_1_length[nop_idx]
+                ):
+                    replace_idx = idx
+                    break
+
+        # Weird, the nop instruction must be larger than every other
+        # instruction, or all instructions are 'nop'?
+        if replace_idx is None:
+            raise gdb.GdbError("can't find an instruction to replace")
+
+        # Record the instruction range that will be replaced with 'nop'
+        # instructions, and mark that we are now on the second pass.
+        self._start = self._pass_1_address[replace_idx]
+        self._end = self._pass_1_address[replace_idx] + self._pass_1_length[replace_idx]
+        self._first_pass = False
+        print("Replace from 0x%x to 0x%x with NOP" % (self._start, self._end))
+
+        # Finally, build the expected result.  Create the _check list, which
+        # is a copy of _pass_1_insn, but replace the instruction we
+        # identified above with a series of 'nop' instructions.
+        self._check = list(self._pass_1_insn)
+        nop_count = int(self._pass_1_length[replace_idx] / self._pass_1_length[nop_idx])
+        nops = ["nop"] * nop_count
+        self._check[replace_idx : (replace_idx + 1)] = nops
+
+    def check(self):
+        """Call this after the second disassembler pass to validate the output."""
+        if self._check != self._pass_2_insn:
+            print("APB, Check : %s" % self._check)
+            print("APB, Result: %s" % self._pass_2_insn)
+            raise gdb.GdbError("mismatch")
+        print("PASS")
+
+
+def add_global_disassembler(dis_class):
+    """Create an instance of DIS_CLASS and register it as a global disassembler."""
+    dis = dis_class()
+    gdb.disassembler.register_disassembler(dis, None)
+    return dis
+
+
+class InvalidDisassembleInfo(gdb.disassembler.DisassembleInfo):
+    """An attempt to create a DisassembleInfo sub-class without calling
+    the parent class init method.
+
+    Attempts to use instances of this class should throw an error
+    saying that the DisassembleInfo is not valid, despite this class
+    having all of the required attributes.
+
+    The reason why this class will never be valid is that an internal
+    field (within the C++ code) can't be initialized without calling
+    the parent class init method."""
+
+    def __init__(self):
+        assert current_pc is not None
+
+    def is_valid(self):
+        return True
+
+    @property
+    def address(self):
+        global current_pc
+        return current_pc
+
+    @property
+    def architecture(self):
+        return gdb.selected_inferior().architecture()
+
+    @property
+    def progspace(self):
+        return gdb.selected_inferior().progspace
+
+
+# Start with all disassemblers removed.
+remove_all_python_disassemblers()
+
+print("Python script imported")
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PUSHED 5/6] gdb: refactor the non-printing disassemblers
  2022-06-15  9:04           ` [PUSHED 0/6] Add Python API for the disassembler Andrew Burgess
                               ` (3 preceding siblings ...)
  2022-06-15  9:04             ` [PUSHED 4/6] gdb/python: implement the print_insn extension language hook Andrew Burgess
@ 2022-06-15  9:04             ` Andrew Burgess
  2022-06-15  9:04             ` [PUSHED 6/6] gdb: unify two dis_asm_read_memory functions in disasm.c Andrew Burgess
  5 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-06-15  9:04 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

This commit started from an observation I made while working on some
other disassembler patches, that is, that the function
gdb_buffered_insn_length, is broken ... sort of.

I noticed that the gdb_buffered_insn_length function doesn't set up
the application data field if the disassemble_info structure.

Further, I noticed that some architectures, for example, ARM, require
that the application_data field be set, see gdb_print_insn_arm in
arm-tdep.c.

And so, if we ever use gdb_buffered_insn_length for ARM, then GDB will
likely crash.  Which is why I said only "sort of" broken.  Right now
we don't use gdb_buffered_insn_length with ARM, so maybe it isn't
broken yet?

Anyway to prove to myself that there was a problem here I extended the
disassembler self tests in disasm-selftests.c to include a test of
gdb_buffered_insn_length.  As I run the test for all architectures, I
do indeed see GDB crash for ARM.

To fix this we need gdb_buffered_insn_length to create a disassembler
that inherits from gdb_disassemble_info, but we also need this new
disassembler to not print anything.

And so, I introduce a new gdb_non_printing_disassembler class, this is
a disassembler that doesn't print anything to the output stream.

I then observed that both ARC and S12Z also create non-printing
disassemblers, but these are slightly different.  While the
disassembler in gdb_non_printing_disassembler reads the instruction
from a buffer, the ARC and S12Z disassemblers read from target memory
using target_read_code.

And so, I further split gdb_non_printing_disassembler into two
sub-classes, gdb_non_printing_memory_disassembler and
gdb_non_printing_buffer_disassembler.

The new selftests now pass, but otherwise, there should be no user
visible changes after this commit.
---
 gdb/arc-linux-tdep.c   | 15 +++----
 gdb/arc-tdep.c         | 29 +++-----------
 gdb/arc-tdep.h         |  5 ---
 gdb/disasm-selftests.c | 86 ++++++++++++++++++++++++++++++++---------
 gdb/disasm.c           | 88 ++++++++++++++++++------------------------
 gdb/disasm.h           | 56 ++++++++++++++++++++++++---
 gdb/s12z-tdep.c        | 26 +------------
 7 files changed, 170 insertions(+), 135 deletions(-)

diff --git a/gdb/arc-linux-tdep.c b/gdb/arc-linux-tdep.c
index 13595f2e8e9..04ca38f1355 100644
--- a/gdb/arc-linux-tdep.c
+++ b/gdb/arc-linux-tdep.c
@@ -356,7 +356,7 @@ arc_linux_sw_breakpoint_from_kind (struct gdbarch *gdbarch,
    */
 
 static std::vector<CORE_ADDR>
-handle_atomic_sequence (arc_instruction insn, disassemble_info &di)
+handle_atomic_sequence (arc_instruction insn, disassemble_info *di)
 {
   const int atomic_seq_len = 24;    /* Instruction sequence length.  */
   std::vector<CORE_ADDR> next_pcs;
@@ -374,7 +374,7 @@ handle_atomic_sequence (arc_instruction insn, disassemble_info &di)
   for (int insn_count = 0; insn_count < atomic_seq_len; ++insn_count)
     {
       arc_insn_decode (arc_insn_get_linear_next_pc (insn),
-		       &di, arc_delayed_print_insn, &insn);
+		       di, arc_delayed_print_insn, &insn);
 
       if (insn.insn_class == BRCC)
         {
@@ -412,15 +412,15 @@ arc_linux_software_single_step (struct regcache *regcache)
 {
   struct gdbarch *gdbarch = regcache->arch ();
   arc_gdbarch_tdep *tdep = (arc_gdbarch_tdep *) gdbarch_tdep (gdbarch);
-  struct disassemble_info di = arc_disassemble_info (gdbarch);
+  struct gdb_non_printing_memory_disassembler dis (gdbarch);
 
   /* Read current instruction.  */
   struct arc_instruction curr_insn;
-  arc_insn_decode (regcache_read_pc (regcache), &di, arc_delayed_print_insn,
-		   &curr_insn);
+  arc_insn_decode (regcache_read_pc (regcache), dis.disasm_info (),
+		   arc_delayed_print_insn, &curr_insn);
 
   if (curr_insn.insn_class == LLOCK)
-    return handle_atomic_sequence (curr_insn, di);
+    return handle_atomic_sequence (curr_insn, dis.disasm_info ());
 
   CORE_ADDR next_pc = arc_insn_get_linear_next_pc (curr_insn);
   std::vector<CORE_ADDR> next_pcs;
@@ -431,7 +431,8 @@ arc_linux_software_single_step (struct regcache *regcache)
   if (curr_insn.has_delay_slot)
     {
       struct arc_instruction next_insn;
-      arc_insn_decode (next_pc, &di, arc_delayed_print_insn, &next_insn);
+      arc_insn_decode (next_pc, dis.disasm_info (), arc_delayed_print_insn,
+		       &next_insn);
       next_pcs.push_back (arc_insn_get_linear_next_pc (next_insn));
     }
   else
diff --git a/gdb/arc-tdep.c b/gdb/arc-tdep.c
index 3edfd466f3b..2f96e24a734 100644
--- a/gdb/arc-tdep.c
+++ b/gdb/arc-tdep.c
@@ -1306,24 +1306,6 @@ arc_is_in_prologue (struct gdbarch *gdbarch, const struct arc_instruction &insn,
   return false;
 }
 
-/* See arc-tdep.h.  */
-
-struct disassemble_info
-arc_disassemble_info (struct gdbarch *gdbarch)
-{
-  struct disassemble_info di;
-  init_disassemble_info_for_no_printing (&di);
-  di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
-  di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
-  di.endian = gdbarch_byte_order (gdbarch);
-  di.read_memory_func = [](bfd_vma memaddr, gdb_byte *myaddr,
-			   unsigned int len, struct disassemble_info *info)
-    {
-      return target_read_code (memaddr, myaddr, len);
-    };
-  return di;
-}
-
 /* Analyze the prologue and update the corresponding frame cache for the frame
    unwinder for unwinding frames that doesn't have debug info.  In such
    situation GDB attempts to parse instructions in the prologue to understand
@@ -1394,9 +1376,10 @@ arc_analyze_prologue (struct gdbarch *gdbarch, const CORE_ADDR entrypoint,
   while (current_prologue_end < limit_pc)
     {
       struct arc_instruction insn;
-      struct disassemble_info di = arc_disassemble_info (gdbarch);
-      arc_insn_decode (current_prologue_end, &di, arc_delayed_print_insn,
-		       &insn);
+
+      struct gdb_non_printing_memory_disassembler dis (gdbarch);
+      arc_insn_decode (current_prologue_end, dis.disasm_info (),
+		       arc_delayed_print_insn, &insn);
 
       if (arc_debug)
 	arc_insn_dump (insn);
@@ -2460,8 +2443,8 @@ dump_arc_instruction_command (const char *args, int from_tty)
 
   CORE_ADDR address = value_as_address (val);
   struct arc_instruction insn;
-  struct disassemble_info di = arc_disassemble_info (target_gdbarch ());
-  arc_insn_decode (address, &di, arc_delayed_print_insn, &insn);
+  struct gdb_non_printing_memory_disassembler dis (target_gdbarch ());
+  arc_insn_decode (address, dis.disasm_info (), arc_delayed_print_insn, &insn);
   arc_insn_dump (insn);
 }
 
diff --git a/gdb/arc-tdep.h b/gdb/arc-tdep.h
index ceca003204f..53e5d8476fc 100644
--- a/gdb/arc-tdep.h
+++ b/gdb/arc-tdep.h
@@ -186,11 +186,6 @@ arc_arch_is_em (const struct bfd_arch_info* arch)
    can't be set to an actual NULL value - that would cause a crash.  */
 int arc_delayed_print_insn (bfd_vma addr, struct disassemble_info *info);
 
-/* Return properly initialized disassemble_info for ARC disassembler - it will
-   not print disassembled instructions to stderr.  */
-
-struct disassemble_info arc_disassemble_info (struct gdbarch *gdbarch);
-
 /* Get branch/jump target address for the INSN.  Note that this function
    returns branch target and doesn't evaluate if this branch is taken or not.
    For the indirect jumps value depends in register state, hence can change.
diff --git a/gdb/disasm-selftests.c b/gdb/disasm-selftests.c
index 4f5667bc4e2..db2d1e0ac59 100644
--- a/gdb/disasm-selftests.c
+++ b/gdb/disasm-selftests.c
@@ -25,13 +25,19 @@
 
 namespace selftests {
 
-/* Test disassembly of one instruction.  */
+/* Return a pointer to a buffer containing an instruction that can be
+   disassembled for architecture GDBARCH.  *LEN will be set to the length
+   of the returned buffer.
 
-static void
-print_one_insn_test (struct gdbarch *gdbarch)
+   If there's no known instruction to disassemble for GDBARCH (because we
+   haven't figured on out, not because no instructions exist) then nullptr
+   is returned, and *LEN is set to 0.  */
+
+static const gdb_byte *
+get_test_insn (struct gdbarch *gdbarch, size_t *len)
 {
-  size_t len = 0;
-  const gdb_byte *insn = NULL;
+  *len = 0;
+  const gdb_byte *insn = nullptr;
 
   switch (gdbarch_bfd_arch_info (gdbarch)->arch)
     {
@@ -40,27 +46,27 @@ print_one_insn_test (struct gdbarch *gdbarch)
       static const gdb_byte bfin_insn[] = {0x17, 0xe1, 0xff, 0xff};
 
       insn = bfin_insn;
-      len = sizeof (bfin_insn);
+      *len = sizeof (bfin_insn);
       break;
     case bfd_arch_arm:
       /* mov     r0, #0 */
       static const gdb_byte arm_insn[] = {0x0, 0x0, 0xa0, 0xe3};
 
       insn = arm_insn;
-      len = sizeof (arm_insn);
+      *len = sizeof (arm_insn);
       break;
     case bfd_arch_ia64:
       /* We get:
 	 internal-error: gdbarch_sw_breakpoint_from_kind:
 	 Assertion `gdbarch->sw_breakpoint_from_kind != NULL' failed.  */
-      return;
+      return insn;
     case bfd_arch_mep:
       /* Disassembles as '*unknown*' insn, then len self-check fails.  */
-      return;
+      return insn;
     case bfd_arch_mips:
       if (gdbarch_bfd_arch_info (gdbarch)->mach == bfd_mach_mips16)
 	/* Disassembles insn, but len self-check fails.  */
-	return;
+	return insn;
       goto generic_case;
     case bfd_arch_tic6x:
       /* Disassembles as '<undefined instruction 0x56454314>' insn, but len
@@ -68,7 +74,7 @@ print_one_insn_test (struct gdbarch *gdbarch)
       goto generic_case;
     case bfd_arch_xtensa:
       /* Disassembles insn, but len self-check fails.  */
-      return;
+      return insn;
     case bfd_arch_or1k:
       /* Disassembles as '*unknown*' insn, but len self-check passes, so let's
 	 allow it.  */
@@ -78,14 +84,14 @@ print_one_insn_test (struct gdbarch *gdbarch)
       static const gdb_byte s390_insn[] = {0x07, 0x07};
 
       insn = s390_insn;
-      len = sizeof (s390_insn);
+      *len = sizeof (s390_insn);
       break;
     case bfd_arch_xstormy16:
       /* nop */
       static const gdb_byte xstormy16_insn[] = {0x0, 0x0};
 
       insn = xstormy16_insn;
-      len = sizeof (xstormy16_insn);
+      *len = sizeof (xstormy16_insn);
       break;
     case bfd_arch_nios2:
     case bfd_arch_score:
@@ -96,19 +102,19 @@ print_one_insn_test (struct gdbarch *gdbarch)
       {
 	int bplen;
 	insn = gdbarch_sw_breakpoint_from_kind (gdbarch, 4, &bplen);
-	len = bplen;
+	*len = bplen;
       }
       break;
     case bfd_arch_arc:
       /* PR 21003 */
       if (gdbarch_bfd_arch_info (gdbarch)->mach == bfd_mach_arc_arc601)
-	return;
+	return insn;
       goto generic_case;
     case bfd_arch_z80:
       {
 	int bplen;
 	insn = gdbarch_sw_breakpoint_from_kind (gdbarch, 0x0008, &bplen);
-	len = bplen;
+	*len = bplen;
       }
       break;
     case bfd_arch_i386:
@@ -118,7 +124,7 @@ print_one_insn_test (struct gdbarch *gdbarch)
 	   opcodes rejects an attempt to disassemble for an arch with
 	   a 64-bit address size when bfd_vma is 32-bit.  */
 	if (info->bits_per_address > sizeof (bfd_vma) * CHAR_BIT)
-	  return;
+	  return insn;
       }
       /* fall through */
     default:
@@ -171,11 +177,25 @@ print_one_insn_test (struct gdbarch *gdbarch)
 	/* Assert that we have found an instruction to disassemble.  */
 	SELF_CHECK (found);
 
-	len = bplen;
+	*len = bplen;
 	break;
       }
     }
-  SELF_CHECK (len > 0);
+  SELF_CHECK (*len > 0);
+
+  return insn;
+}
+
+/* Test disassembly of one instruction.  */
+
+static void
+print_one_insn_test (struct gdbarch *gdbarch)
+{
+  size_t len;
+  const gdb_byte *insn = get_test_insn (gdbarch, &len);
+
+  if (insn == nullptr)
+    return;
 
   /* Test gdb_disassembler for a given gdbarch by reading data from a
      pre-allocated buffer.  If you want to see the disassembled
@@ -234,6 +254,32 @@ print_one_insn_test (struct gdbarch *gdbarch)
   SELF_CHECK (di.print_insn (0) == len);
 }
 
+/* Test the gdb_buffered_insn_length function.  */
+
+static void
+buffered_insn_length_test (struct gdbarch *gdbarch)
+{
+  size_t buf_len;
+  const gdb_byte *insn = get_test_insn (gdbarch, &buf_len);
+
+  if (insn == nullptr)
+    return;
+
+  /* The tic6x architecture is VLIW.  Disassembling requires that the
+     entire instruction bundle be available.  However, the buffer we got
+     back from get_test_insn only contains a single instruction, which is
+     just part of an instruction bundle.  As a result, the disassemble will
+     fail.  To avoid this, skip tic6x tests now.  */
+  if (gdbarch_bfd_arch_info (gdbarch)->arch == bfd_arch_tic6x)
+    return;
+
+  CORE_ADDR insn_address = 0;
+  int calculated_len = gdb_buffered_insn_length (gdbarch, insn, buf_len,
+						 insn_address);
+
+  SELF_CHECK (calculated_len == buf_len);
+}
+
 /* Test disassembly on memory error.  */
 
 static void
@@ -294,4 +340,6 @@ _initialize_disasm_selftests ()
 					 selftests::print_one_insn_test);
   selftests::register_test_foreach_arch ("memory_error",
 					 selftests::memory_error_test);
+  selftests::register_test_foreach_arch ("buffered_insn_length",
+					 selftests::buffered_insn_length_test);
 }
diff --git a/gdb/disasm.c b/gdb/disasm.c
index 4af40c916b2..53cd6f5b6bb 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -1003,66 +1003,56 @@ gdb_insn_length (struct gdbarch *gdbarch, CORE_ADDR addr)
   return gdb_print_insn (gdbarch, addr, &null_stream, NULL);
 }
 
-/* An fprintf-function for use by the disassembler when we know we don't
-   want to print anything.  Always returns success.  */
+/* See disasm.h.  */
 
-static int ATTRIBUTE_PRINTF (2, 3)
-gdb_disasm_null_printf (void *stream, const char *format, ...)
+int
+gdb_non_printing_disassembler::null_fprintf_func (void *stream,
+						  const char *format, ...)
 {
   return 0;
 }
 
-/* An fprintf-function for use by the disassembler when we know we don't
-   want to print anything, and the disassembler is using style.  Always
-   returns success.  */
+/* See disasm.h.  */
 
-static int ATTRIBUTE_PRINTF (3, 4)
-gdb_disasm_null_styled_printf (void *stream,
-			       enum disassembler_style style,
-			       const char *format, ...)
+int
+gdb_non_printing_disassembler::null_fprintf_styled_func
+  (void *stream, enum disassembler_style style, const char *format, ...)
 {
   return 0;
 }
 
 /* See disasm.h.  */
 
-void
-init_disassemble_info_for_no_printing (struct disassemble_info *dinfo)
+int
+gdb_non_printing_memory_disassembler::dis_asm_read_memory
+  (bfd_vma memaddr, bfd_byte *myaddr, unsigned int length,
+   struct disassemble_info *dinfo)
 {
-  init_disassemble_info (dinfo, nullptr, gdb_disasm_null_printf,
-			 gdb_disasm_null_styled_printf);
+  return target_read_code (memaddr, myaddr, length);
 }
 
-/* Initialize a struct disassemble_info for gdb_buffered_insn_length.
-   Upon return, *DISASSEMBLER_OPTIONS_HOLDER owns the string pointed
-   to by DI.DISASSEMBLER_OPTIONS.  */
+/* A non-printing disassemble_info management class.  The disassemble_info
+   setup by this class will not print anything to the output stream (there
+   is no output stream), and the instruction to be disassembled will be
+   read from a buffer passed to the constructor.  */
 
-static void
-gdb_buffered_insn_length_init_dis (struct gdbarch *gdbarch,
-				   struct disassemble_info *di,
-				   const gdb_byte *insn, int max_len,
-				   CORE_ADDR addr,
-				   std::string *disassembler_options_holder)
+struct gdb_non_printing_buffer_disassembler
+  : public gdb_non_printing_disassembler
 {
-  init_disassemble_info_for_no_printing (di);
-
-  /* init_disassemble_info installs buffer_read_memory, etc.
-     so we don't need to do that here.
-     The cast is necessary until disassemble_info is const-ified.  */
-  di->buffer = (gdb_byte *) insn;
-  di->buffer_length = max_len;
-  di->buffer_vma = addr;
-
-  di->arch = gdbarch_bfd_arch_info (gdbarch)->arch;
-  di->mach = gdbarch_bfd_arch_info (gdbarch)->mach;
-  di->endian = gdbarch_byte_order (gdbarch);
-  di->endian_code = gdbarch_byte_order_for_code (gdbarch);
-
-  *disassembler_options_holder = get_all_disassembler_options (gdbarch);
-  if (!disassembler_options_holder->empty ())
-    di->disassembler_options = disassembler_options_holder->c_str ();
-  disassemble_init_for_target (di);
-}
+  /* Constructor.  GDBARCH is the architecture to disassemble for, BUFFER
+     contains the instruction to disassemble, and INSN_ADDRESS is the
+     address (in target memory) of the instruction to disassemble.  */
+  gdb_non_printing_buffer_disassembler (struct gdbarch *gdbarch,
+					gdb::array_view<const gdb_byte> buffer,
+					CORE_ADDR insn_address)
+    : gdb_non_printing_disassembler (gdbarch, nullptr)
+  {
+    /* The cast is necessary until disassemble_info is const-ified.  */
+    m_di.buffer = (gdb_byte *) buffer.data ();
+    m_di.buffer_length = buffer.size ();
+    m_di.buffer_vma = insn_address;
+  }
+};
 
 /* Return the length in bytes of INSN.  MAX_LEN is the size of the
    buffer containing INSN.  */
@@ -1071,14 +1061,10 @@ int
 gdb_buffered_insn_length (struct gdbarch *gdbarch,
 			  const gdb_byte *insn, int max_len, CORE_ADDR addr)
 {
-  struct disassemble_info di;
-  std::string disassembler_options_holder;
-
-  gdb_buffered_insn_length_init_dis (gdbarch, &di, insn, max_len, addr,
-				     &disassembler_options_holder);
-
-  int result = gdb_print_insn_1 (gdbarch, addr, &di);
-  disassemble_free_target (&di);
+  gdb::array_view<const gdb_byte> buffer
+    = gdb::make_array_view (insn, max_len);
+  gdb_non_printing_buffer_disassembler dis (gdbarch, buffer, addr);
+  int result = gdb_print_insn_1 (gdbarch, addr, dis.disasm_info ());
   return result;
 }
 
diff --git a/gdb/disasm.h b/gdb/disasm.h
index f31ca92b038..ec5120351a1 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -136,6 +136,56 @@ struct gdb_printing_disassembler : public gdb_disassemble_info
     ATTRIBUTE_PRINTF(3,4);
 };
 
+/* A basic disassembler that doesn't actually print anything.  */
+
+struct gdb_non_printing_disassembler : public gdb_disassemble_info
+{
+  gdb_non_printing_disassembler (struct gdbarch *gdbarch,
+				 read_memory_ftype read_memory_func)
+    : gdb_disassemble_info (gdbarch, nullptr /* stream */,
+			    read_memory_func,
+			    nullptr /* memory_error_func */,
+			    nullptr /* print_address_func */,
+			    null_fprintf_func,
+			    null_fprintf_styled_func)
+  { /* Nothing.  */ }
+
+private:
+
+  /* Callback used as the disassemble_info's fprintf_func callback, this
+     doesn't write anything to STREAM, but just returns 0.  */
+  static int null_fprintf_func (void *stream, const char *format, ...)
+    ATTRIBUTE_PRINTF(2,3);
+
+  /* Callback used as the disassemble_info's fprintf_styled_func callback,
+     , this doesn't write anything to STREAM, but just returns 0.  */
+  static int null_fprintf_styled_func (void *stream,
+				       enum disassembler_style style,
+				       const char *format, ...)
+    ATTRIBUTE_PRINTF(3,4);
+};
+
+/* A non-printing disassemble_info management class.  The disassemble_info
+   setup by this class will not print anything to the output stream (there
+   is no output stream), and the instruction to be disassembled will be
+   read from target memory.  */
+
+struct gdb_non_printing_memory_disassembler
+  : public gdb_non_printing_disassembler
+{
+  /* Constructor.  GDBARCH is the architecture to disassemble for.  */
+  gdb_non_printing_memory_disassembler (struct gdbarch *gdbarch)
+    :gdb_non_printing_disassembler (gdbarch, dis_asm_read_memory)
+  { /* Nothing.  */ }
+
+private:
+
+  /* Implements the read_memory_func disassemble_info callback.  */
+  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
+				  unsigned int len,
+				  struct disassemble_info *info);
+};
+
 /* A dissassembler class that provides 'print_insn', a method for
    disassembling a single instruction to the output stream.  */
 
@@ -278,10 +328,4 @@ extern char *get_disassembler_options (struct gdbarch *gdbarch);
 
 extern void set_disassembler_options (const char *options);
 
-/* Setup DINFO with its output function and output stream setup so that
-   nothing is printed while disassembling.  */
-
-extern void init_disassemble_info_for_no_printing
-  (struct disassemble_info *dinfo);
-
 #endif
diff --git a/gdb/s12z-tdep.c b/gdb/s12z-tdep.c
index 5394c1bbf5e..4e33faaea9a 100644
--- a/gdb/s12z-tdep.c
+++ b/gdb/s12z-tdep.c
@@ -141,27 +141,6 @@ s12z_dwarf_reg_to_regnum (struct gdbarch *gdbarch, int num)
 
 /* Support functions for frame handling.  */
 
-
-/* Return a disassemble_info initialized for s12z disassembly, however,
-   the disassembler will not actually print anything.  */
-
-static struct disassemble_info
-s12z_disassemble_info (struct gdbarch *gdbarch)
-{
-  struct disassemble_info di;
-  init_disassemble_info_for_no_printing (&di);
-  di.arch = gdbarch_bfd_arch_info (gdbarch)->arch;
-  di.mach = gdbarch_bfd_arch_info (gdbarch)->mach;
-  di.endian = gdbarch_byte_order (gdbarch);
-  di.read_memory_func = [](bfd_vma memaddr, gdb_byte *myaddr,
-			   unsigned int len, struct disassemble_info *info)
-    {
-      return target_read_code (memaddr, myaddr, len);
-    };
-  return di;
-}
-
-
 /* A struct (based on mem_read_abstraction_base) to read memory
    through the disassemble_info API.  */
 struct mem_read_abstraction
@@ -332,15 +311,14 @@ s12z_frame_cache (struct frame_info *this_frame, void **prologue_cache)
   int frame_size = 0;
   int saved_frame_size = 0;
 
-  struct disassemble_info di = s12z_disassemble_info (gdbarch);
-
+  struct gdb_non_printing_memory_disassembler dis (gdbarch);
 
   struct mem_read_abstraction mra;
   mra.base.read = (int (*)(mem_read_abstraction_base*,
 			   int, size_t, bfd_byte*)) abstract_read_memory;
   mra.base.advance = advance ;
   mra.base.posn = posn;
-  mra.info = &di;
+  mra.info = dis.disasm_info ();
 
   while (this_pc > addr)
     {
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

* [PUSHED 6/6] gdb: unify two dis_asm_read_memory functions in disasm.c
  2022-06-15  9:04           ` [PUSHED 0/6] Add Python API for the disassembler Andrew Burgess
                               ` (4 preceding siblings ...)
  2022-06-15  9:04             ` [PUSHED 5/6] gdb: refactor the non-printing disassemblers Andrew Burgess
@ 2022-06-15  9:04             ` Andrew Burgess
  5 siblings, 0 replies; 80+ messages in thread
From: Andrew Burgess @ 2022-06-15  9:04 UTC (permalink / raw)
  To: gdb-patches; +Cc: Andrew Burgess

After the recent restructuring of the disassembler code, GDB has ended
up with two identical class static functions, both called
dis_asm_read_memory, with identical implementations.

My first thought was to move these out of their respective classes,
and just make them global functions, then I'd only need a single
copy.

And maybe that's the right way to go.  But I disliked that by doing
that I loose the encapsulation of the method with the corresponding
disassembler class.

So, instead, I placed the static method into its own class, and had
both the gdb_non_printing_memory_disassembler and gdb_disassembler
classes inherit from this new class as an additional base-class.

In terms of code generated, I don't think there's any significant
difference with this approach, but I think this better reflects how
the function is closely tied to the disassembler.

There should be no user visible changes after this commit.
---
 gdb/disasm.c | 16 +++-------------
 gdb/disasm.h | 29 +++++++++++++++++------------
 2 files changed, 20 insertions(+), 25 deletions(-)

diff --git a/gdb/disasm.c b/gdb/disasm.c
index 53cd6f5b6bb..c6edc92930d 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -132,9 +132,9 @@ line_has_code_p (htab_t table, struct symtab *symtab, int line)
 /* Wrapper of target_read_code.  */
 
 int
-gdb_disassembler::dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
-				       unsigned int len,
-				       struct disassemble_info *info)
+gdb_disassembler_memory_reader::dis_asm_read_memory
+  (bfd_vma memaddr, gdb_byte *myaddr, unsigned int len,
+   struct disassemble_info *info)
 {
   return target_read_code (memaddr, myaddr, len);
 }
@@ -1021,16 +1021,6 @@ gdb_non_printing_disassembler::null_fprintf_styled_func
   return 0;
 }
 
-/* See disasm.h.  */
-
-int
-gdb_non_printing_memory_disassembler::dis_asm_read_memory
-  (bfd_vma memaddr, bfd_byte *myaddr, unsigned int length,
-   struct disassemble_info *dinfo)
-{
-  return target_read_code (memaddr, myaddr, length);
-}
-
 /* A non-printing disassemble_info management class.  The disassemble_info
    setup by this class will not print anything to the output stream (there
    is no output stream), and the instruction to be disassembled will be
diff --git a/gdb/disasm.h b/gdb/disasm.h
index ec5120351a1..da03e130526 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -165,31 +165,39 @@ struct gdb_non_printing_disassembler : public gdb_disassemble_info
     ATTRIBUTE_PRINTF(3,4);
 };
 
+/* This is a helper class, for use as an additional base-class, by some of
+   the disassembler classes below.  This class just defines a static method
+   for reading from target memory, which can then be used by the various
+   disassembler sub-classes.  */
+
+struct gdb_disassembler_memory_reader
+{
+  /* Implements the read_memory_func disassemble_info callback.  */
+  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
+				  unsigned int len,
+				  struct disassemble_info *info);
+};
+
 /* A non-printing disassemble_info management class.  The disassemble_info
    setup by this class will not print anything to the output stream (there
    is no output stream), and the instruction to be disassembled will be
    read from target memory.  */
 
 struct gdb_non_printing_memory_disassembler
-  : public gdb_non_printing_disassembler
+  : public gdb_non_printing_disassembler,
+    private gdb_disassembler_memory_reader
 {
   /* Constructor.  GDBARCH is the architecture to disassemble for.  */
   gdb_non_printing_memory_disassembler (struct gdbarch *gdbarch)
     :gdb_non_printing_disassembler (gdbarch, dis_asm_read_memory)
   { /* Nothing.  */ }
-
-private:
-
-  /* Implements the read_memory_func disassemble_info callback.  */
-  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
-				  unsigned int len,
-				  struct disassemble_info *info);
 };
 
 /* A dissassembler class that provides 'print_insn', a method for
    disassembling a single instruction to the output stream.  */
 
-struct gdb_disassembler : public gdb_printing_disassembler
+struct gdb_disassembler : public gdb_printing_disassembler,
+			  private gdb_disassembler_memory_reader
 {
   gdb_disassembler (struct gdbarch *gdbarch, struct ui_file *file)
     : gdb_disassembler (gdbarch, file, dis_asm_read_memory)
@@ -239,9 +247,6 @@ struct gdb_disassembler : public gdb_printing_disassembler
      (currently just to addresses and symbols) as it goes.  */
   static bool use_ext_lang_colorization_p;
 
-  static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
-				  unsigned int len,
-				  struct disassemble_info *info);
   static void dis_asm_memory_error (int err, bfd_vma memaddr,
 				    struct disassemble_info *info);
   static void dis_asm_print_address (bfd_vma addr,
-- 
2.25.4


^ permalink raw reply	[flat|nested] 80+ messages in thread

end of thread, other threads:[~2022-06-15  9:04 UTC | newest]

Thread overview: 80+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2021-10-13 21:59 [PATCH 0/5] Add Python API for the disassembler Andrew Burgess
2021-10-13 21:59 ` [PATCH 1/5] gdb: make disassembler fprintf callback a static member function Andrew Burgess
2021-10-20 20:40   ` Tom Tromey
2021-10-22 12:51     ` Andrew Burgess
2021-10-13 21:59 ` [PATCH 2/5] gdb/python: new gdb.architecture_names function Andrew Burgess
2021-10-14  6:52   ` Eli Zaretskii
2021-10-22 12:51     ` Andrew Burgess
2021-10-20 20:40   ` Tom Tromey
2021-10-22 13:02   ` Simon Marchi
2021-10-22 17:34     ` Andrew Burgess
2021-10-22 18:42       ` Simon Marchi
2021-10-13 21:59 ` [PATCH 3/5] gdb/python: move gdb.Membuf support into a new file Andrew Burgess
2021-10-20 20:42   ` Tom Tromey
2021-10-22 12:52     ` Andrew Burgess
2021-10-13 21:59 ` [PATCH 4/5] gdb: add extension language print_insn hook Andrew Burgess
2021-10-20 21:06   ` Tom Tromey
2021-10-13 21:59 ` [PATCH 5/5] gdb/python: implement the print_insn extension language hook Andrew Burgess
2021-10-14  7:12   ` Eli Zaretskii
2021-10-22 17:47     ` Andrew Burgess
2021-10-22 18:33       ` Eli Zaretskii
2021-10-22 13:30   ` Simon Marchi
2022-03-23 22:41 ` [PATCHv2 0/3] Add Python API for the disassembler Andrew Burgess
2022-03-23 22:41   ` [PATCHv2 1/3] gdb: add new base class to gdb_disassembler Andrew Burgess
2022-03-23 22:41   ` [PATCHv2 2/3] gdb: add extension language print_insn hook Andrew Burgess
2022-03-23 22:41   ` [PATCHv2 3/3] gdb/python: implement the print_insn extension language hook Andrew Burgess
2022-03-24  7:10     ` Eli Zaretskii
2022-03-24 19:51       ` Andrew Burgess
2022-04-04 22:19   ` [PATCHv3 0/6] Add Python API for the disassembler Andrew Burgess
2022-04-04 22:19     ` [PATCHv3 1/6] gdb: move gdb_disassembly_flag into a new disasm-flags.h file Andrew Burgess
2022-04-05 14:32       ` Tom Tromey
2022-04-06 12:18         ` Andrew Burgess
2022-04-04 22:19     ` [PATCHv3 2/6] gdb: add new base class to gdb_disassembler Andrew Burgess
2022-04-04 22:19     ` [PATCHv3 3/6] gdb: add extension language print_insn hook Andrew Burgess
2022-04-04 22:19     ` [PATCHv3 4/6] gdb/python: implement the print_insn extension language hook Andrew Burgess
2022-04-05 12:04       ` Eli Zaretskii
2022-04-04 22:19     ` [PATCHv3 5/6] gdb: refactor the non-printing disassemblers Andrew Burgess
2022-04-04 22:19     ` [PATCHv3 6/6] gdb: unify two dis_asm_read_memory functions in disasm.c Andrew Burgess
2022-04-25  9:15     ` [PATCHv4 0/5] Add Python API for the disassembler Andrew Burgess
2022-04-25  9:15       ` [PATCHv4 1/5] gdb: add new base class to gdb_disassembler Andrew Burgess
2022-05-03 13:34         ` Simon Marchi
2022-05-03 16:13           ` Andrew Burgess
2022-05-05 17:39           ` Andrew Burgess
2022-04-25  9:15       ` [PATCHv4 2/5] gdb: add extension language print_insn hook Andrew Burgess
2022-05-03 13:42         ` Simon Marchi
2022-04-25  9:15       ` [PATCHv4 3/5] gdb/python: implement the print_insn extension language hook Andrew Burgess
2022-04-25 11:26         ` Eli Zaretskii
2022-05-03 14:55         ` Simon Marchi
2022-05-05 18:17           ` Andrew Burgess
2022-05-24  1:16             ` Simon Marchi
2022-05-24  8:30               ` Andrew Burgess
2022-05-25 10:37                 ` Andrew Burgess
2022-04-25  9:15       ` [PATCHv4 4/5] gdb: refactor the non-printing disassemblers Andrew Burgess
2022-04-25  9:15       ` [PATCHv4 5/5] gdb: unify two dis_asm_read_memory functions in disasm.c Andrew Burgess
2022-05-03 10:12       ` [PATCHv4 0/5] Add Python API for the disassembler Andrew Burgess
2022-05-06 17:17       ` [PATCHv5 " Andrew Burgess
2022-05-06 17:17         ` [PATCHv5 1/5] gdb: add new base class to gdb_disassembler Andrew Burgess
2022-05-06 17:17         ` [PATCHv5 2/5] gdb: add extension language print_insn hook Andrew Burgess
2022-05-06 17:17         ` [PATCHv5 3/5] gdb/python: implement the print_insn extension language hook Andrew Burgess
2022-05-06 18:11           ` Eli Zaretskii
2022-05-18 10:08             ` Andrew Burgess
2022-05-18 12:08               ` Eli Zaretskii
2022-05-23  8:59                 ` Andrew Burgess
2022-05-23 11:23                   ` Eli Zaretskii
2022-05-06 17:17         ` [PATCHv5 4/5] gdb: refactor the non-printing disassemblers Andrew Burgess
2022-05-06 17:17         ` [PATCHv5 5/5] gdb: unify two dis_asm_read_memory functions in disasm.c Andrew Burgess
2022-05-25 10:49         ` [PATCHv6 0/6] Add Python API for the disassembler Andrew Burgess
2022-05-25 10:49           ` [PATCHv6 1/6] gdb/python: convert gdbpy_err_fetch to use gdbpy_ref Andrew Burgess
2022-05-25 10:49           ` [PATCHv6 2/6] gdb: add new base class to gdb_disassembler Andrew Burgess
2022-05-25 10:49           ` [PATCHv6 3/6] gdb: add extension language print_insn hook Andrew Burgess
2022-05-25 10:49           ` [PATCHv6 4/6] gdb/python: implement the print_insn extension language hook Andrew Burgess
2022-05-25 13:32             ` Eli Zaretskii
2022-05-25 10:49           ` [PATCHv6 5/6] gdb: refactor the non-printing disassemblers Andrew Burgess
2022-05-25 10:49           ` [PATCHv6 6/6] gdb: unify two dis_asm_read_memory functions in disasm.c Andrew Burgess
2022-06-15  9:04           ` [PUSHED 0/6] Add Python API for the disassembler Andrew Burgess
2022-06-15  9:04             ` [PUSHED 1/6] gdb/python: convert gdbpy_err_fetch to use gdbpy_ref Andrew Burgess
2022-06-15  9:04             ` [PUSHED 2/6] gdb: add new base class to gdb_disassembler Andrew Burgess
2022-06-15  9:04             ` [PUSHED 3/6] gdb: add extension language print_insn hook Andrew Burgess
2022-06-15  9:04             ` [PUSHED 4/6] gdb/python: implement the print_insn extension language hook Andrew Burgess
2022-06-15  9:04             ` [PUSHED 5/6] gdb: refactor the non-printing disassemblers Andrew Burgess
2022-06-15  9:04             ` [PUSHED 6/6] gdb: unify two dis_asm_read_memory functions in disasm.c Andrew Burgess

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).