public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/3] disassembler syntax highlighting in objdump (via libopcodes)
@ 2022-02-16 20:53 Andrew Burgess
  2022-02-16 20:53 ` [PATCH 1/3] objdump/opcodes: add syntax highlighting to disassembler output Andrew Burgess
                   ` (4 more replies)
  0 siblings, 5 replies; 21+ messages in thread
From: Andrew Burgess @ 2022-02-16 20:53 UTC (permalink / raw)
  To: binutils; +Cc: Andrew Burgess

This series is a serious attempt at what I discussed here:

  https://sourceware.org/pipermail/binutils/2021-December/118806.html

This series changes libopcodes so that this disassemblers can supply
styling information with every piece of disassembly output, e.g. is
this a register?  an address?  a mnemonic?  etc.

Users of the disassembler can then choose to make use of this
information to add styling to the disassembler output.

And that is what I do for objdump in this series.  The styling is off
by default, but can be turned on with a new command line flag:
    --disassembler-color=off|color|extended-color

I've updated GDB enough to keep it building and running after this
change, though at this point GDB doesn't make use of the new styling
information, that will come later.

All feedback would be welcome.

Thanks,
Andrew

---

Andrew Burgess (3):
  objdump/opcodes: add syntax highlighting to disassembler output
  opcodes/riscv: implement style support in the disassembler
  opcodes/i386: partially implement disassembler style support

 binutils/NEWS              |   4 +
 binutils/doc/binutils.texi |  11 ++
 binutils/objdump.c         | 245 ++++++++++++++++++++++++++++++++-----
 gdb/disasm.c               |  34 ++++-
 gdb/disasm.h               |   7 ++
 include/dis-asm.h          |  62 +++++++++-
 opcodes/dis-init.c         |   5 +-
 opcodes/disassemble.c      |  23 +++-
 opcodes/i386-dis.c         |  71 +++++++----
 opcodes/riscv-dis.c        | 147 +++++++++++-----------
 10 files changed, 475 insertions(+), 134 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 1/3] objdump/opcodes: add syntax highlighting to disassembler output
  2022-02-16 20:53 [PATCH 0/3] disassembler syntax highlighting in objdump (via libopcodes) Andrew Burgess
@ 2022-02-16 20:53 ` Andrew Burgess
  2022-02-28 15:54   ` Tom Tromey
  2022-02-16 20:53 ` [PATCH 2/3] opcodes/riscv: implement style support in the disassembler Andrew Burgess
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Andrew Burgess @ 2022-02-16 20:53 UTC (permalink / raw)
  To: binutils; +Cc: Andrew Burgess

This commit adds the _option_ of having disassembler output syntax
highlighted in objdump.  This option is _off_ by default.  The new
command line options are:

  --disassembler-color=off		# The default.
  --disassembler-color=color
  --disassembler-color=extended-color

I have implemented two colour modes, using the same option names as we
use of --visualize-jumps, a basic 8-color mode ("color"), and an
extended 8bit color mode ("extended-color").

The syntax highlighting requires that each targets disassembler be
updated; each time the disassembler produces some output we now pass
through an additional parameter indicating what style should be
applied to the text.

As updating all target disassemblers is a large task, the old API is
maintained.  And so, a user of the disassembler (i.e. objdump, gdb)
must provide two functions, the current non-styled print function, and
a new, styled print function.

I don't currently have a plan for converting every single target
disassembler, my hope is that interested folk will update the
disassemblers they are interested in.  But it is possible some might
never get updated.

In this initial series I intend to convert the RISC-V disassembler
completely, and also do a partial conversion of the x86 disassembler.
Hopefully having the x86 disassembler at least partial converted will
allow more people to try this out easily and provide feedback.

In this commit I have focused on objdump.  The changes to GDB at this
point are the bare minimum required to get things compiling, GDB makes
no use of the styling information to provide any colors, that will
come later, if this commit is accepted.

This first commit in the series doesn't convert any target
disassemblers at all (the next two commits will update some targets),
so after this commit, the only color you will see in the disassembler
output, is that produced from objdump itself, e.g. from
objdump_print_addr_with_sym, where we print an address and a symbol
name, these are now printed with styling information, and so will have
colors applied (if the option is on).

Finally, my ability to pick "good" colors is ... well, terrible.  I'm
in no way committed to the colors I've picked here, so I encourage
people to suggest new colors, or wait for this commit to land, and
then patch the choice of colors.

I do have an idea about using possibly an environment variable to
allow the objdump colors to be customised, but I haven't done anything
like that in this commit, the color choices are just fixed in the code
for now.

binutils/ChangeLog:

	* NEWS: Mention new feature.
	* doc/binutils.texi (objdump): Describe --disassembler-color
	option.
	* objdump.c (disassembler_color): New global.
	(disassembler_extended_color): Likewise.
	(disassembler_in_comment): Likewise.
	(usage): Mention --disassembler-color option.
	(long_options): Add --disassembler-color option.
	(objdump_print_value): Use fprintf_styled_func instead of
	fprintf_func.
	(objdump_print_symname): Likewise.
	(objdump_print_addr_with_sym): Likewise.
	(objdump_color_for_disassembler_style): New function.
	(objdump_styled_sprintf): New function.
	(fprintf_styled): New function.
	(disassemble_jumps): Use disassemble_set_printf, and reset
	disassembler_in_comment.
	(null_styled_print): New function.
	(disassemble_bytes): Use disassemble_set_printf, and reset
	disassembler_in_comment.
	(disassemble_data): Update init_disassemble_info call.
	(main): Handle --disassembler-color option.

include/ChangeLog:

	* dis-asm.h (enum disassembler_style): New enum.
	(struct disassemble_info): Add fprintf_styled_func field, and
	created_styled_output field.
	(disassemble_set_printf): Declare.
	(init_disassemble_info): Add additional parameter.
	(INIT_DISASSEMBLE_INFO): Add additional parameter.

opcodes/ChangeLog:

	* dis-init.c (init_disassemble_info): Take extra parameter,
	initialize the new fprintf_styled_func and created_styled_output
	fields.
	* disassembler.c (disassemble_set_printf): New function definition.
---
 binutils/NEWS              |   4 +
 binutils/doc/binutils.texi |  11 ++
 binutils/objdump.c         | 245 ++++++++++++++++++++++++++++++++-----
 gdb/disasm.c               |  34 ++++-
 gdb/disasm.h               |   7 ++
 include/dis-asm.h          |  62 +++++++++-
 opcodes/dis-init.c         |   5 +-
 opcodes/disassemble.c      |  13 ++
 8 files changed, 341 insertions(+), 40 deletions(-)

diff --git a/binutils/NEWS b/binutils/NEWS
index c188b469a77..3476deb096f 100644
--- a/binutils/NEWS
+++ b/binutils/NEWS
@@ -1,5 +1,9 @@
 -*- text -*-
 
+* objdump now supports syntax highlighting of disassembler output for some
+  architectures.  Use the --disassembler-color=MODE command line flag, with
+  mode being either off, color, or extended-color.
+
 Changes in 2.38:
 
 * elfedit: Add --output-abiversion option to update ABIVERSION.
diff --git a/binutils/doc/binutils.texi b/binutils/doc/binutils.texi
index 288974be386..9d3b558a0be 100644
--- a/binutils/doc/binutils.texi
+++ b/binutils/doc/binutils.texi
@@ -2268,6 +2268,7 @@
         [@option{--prefix-strip=}@var{level}]
         [@option{--insn-width=}@var{width}]
         [@option{--visualize-jumps[=color|=extended-color|=off]}
+        [@option{--disassembler-color=[color|extended-color|off]}
         [@option{-U} @var{method}] [@option{--unicode=}@var{method}]
         [@option{-V}|@option{--version}]
         [@option{-H}|@option{--help}]
@@ -2805,6 +2806,16 @@
 after it has previously been enabled then use
 @option{visualize-jumps=off}.
 
+@item --disassembler-color=[color|extended-color|off]
+Apply syntax highlighting to the disassembler output.  The
+@option{color} argument adds color using simple terminal colors.
+Alternatively the @option{extended-color} argument will use 8bit
+colors, but these might not work on all terminals.
+
+If it is necessary to disable the @option{--disassembler-color} option
+after it has previously been enabled then use
+@option{--disassembler-color=off}.
+
 @item -W[lLiaprmfFsoORtUuTgAckK]
 @itemx --dwarf[=rawline,=decodedline,=info,=abbrev,=pubnames,=aranges,=macro,=frames,=frames-interp,=str,=str-offsets,=loc,=Ranges,=pubtypes,=trace_info,=trace_abbrev,=trace_aranges,=gdb_index,=addr,=cu_index,=links,=follow-links]
 @include debug.options.texi
diff --git a/binutils/objdump.c b/binutils/objdump.c
index 24e91869bfd..97e3b1d6115 100644
--- a/binutils/objdump.c
+++ b/binutils/objdump.c
@@ -130,10 +130,19 @@ static bool visualize_jumps = false;	/* --visualize-jumps.  */
 static bool color_output = false;	/* --visualize-jumps=color.  */
 static bool extended_color_output = false; /* --visualize-jumps=extended-color.  */
 static int process_links = false;       /* --process-links.  */
+static bool disassembler_color = false; /* --disassembler-color=color.  */
+static bool disassembler_extended_color = false; /* --disassembler-color=extended-color.  */
 
 static int dump_any_debugging;
 static int demangle_flags = DMGL_ANSI | DMGL_PARAMS;
 
+/* This is reset to false each time we enter the disassembler, and set true
+   when the disassembler emits something in the dis_style_comment_start
+   style.  Once this is true, all further output on that line is done in
+   the comment style.  This only has an effect when disassembler coloring
+   is turned on.  */
+static bool disassembler_in_comment = false;
+
 /* A structure to record the sections mentioned in -j switches.  */
 struct only
 {
@@ -388,6 +397,10 @@ usage (FILE *stream, int status)
                                  Use extended 8-bit color codes\n"));
       fprintf (stream, _("\
       --visualize-jumps=off      Disable jump visualization\n\n"));
+      fprintf (stream, _("\
+      --disassembler-color=off   Disable disassembler color output.\n\n"));
+      fprintf (stream, _("\
+      --disassembler-color=color Use basic colors in disassembler output.\n\n"));
 
       list_supported_targets (program_name, stream);
       list_supported_architectures (program_name, stream);
@@ -429,7 +442,8 @@ enum option_values
     OPTION_CTF,
     OPTION_CTF_PARENT,
 #endif
-    OPTION_VISUALIZE_JUMPS
+    OPTION_VISUALIZE_JUMPS,
+    OPTION_DISASSEMBLER_COLOR
   };
 
 static struct option long_options[]=
@@ -495,6 +509,7 @@ static struct option long_options[]=
   {"version", no_argument, NULL, 'V'},
   {"visualize-jumps", optional_argument, 0, OPTION_VISUALIZE_JUMPS},
   {"wide", no_argument, NULL, 'w'},
+  {"disassembler-color", required_argument, NULL, OPTION_DISASSEMBLER_COLOR},
   {NULL, no_argument, NULL, 0}
 };
 \f
@@ -1238,7 +1253,7 @@ objdump_print_value (bfd_vma vma, struct disassemble_info *inf,
       if (*p == '\0')
 	--p;
     }
-  (*inf->fprintf_func) (inf->stream, "%s", p);
+  (*inf->fprintf_styled_func) (inf->stream, dis_style_address, "%s", p);
 }
 
 /* Print the name of a symbol.  */
@@ -1272,10 +1287,11 @@ objdump_print_symname (bfd *abfd, struct disassemble_info *inf,
 
   if (inf != NULL)
     {
-      (*inf->fprintf_func) (inf->stream, "%s", name);
+      (*inf->fprintf_styled_func) (inf->stream, dis_style_symbol, "%s", name);
       if (version_string && *version_string != '\0')
-	(*inf->fprintf_func) (inf->stream, hidden ? "@%s" : "@@%s",
-			      version_string);
+	(*inf->fprintf_styled_func) (inf->stream, dis_style_symbol,
+				     hidden ? "@%s" : "@@%s",
+				     version_string);
     }
   else
     {
@@ -1537,31 +1553,33 @@ objdump_print_addr_with_sym (bfd *abfd, asection *sec, asymbol *sym,
   if (!no_addresses)
     {
       objdump_print_value (vma, inf, skip_zeroes);
-      (*inf->fprintf_func) (inf->stream, " ");
+      (*inf->fprintf_styled_func) (inf->stream, dis_style_text, " ");
     }
 
   if (sym == NULL)
     {
       bfd_vma secaddr;
 
-      (*inf->fprintf_func) (inf->stream, "<%s",
-			    sanitize_string (bfd_section_name (sec)));
+      (*inf->fprintf_styled_func) (inf->stream, dis_style_text,"<");
+      (*inf->fprintf_styled_func) (inf->stream, dis_style_symbol, "%s",
+				   sanitize_string (bfd_section_name (sec)));
       secaddr = bfd_section_vma (sec);
       if (vma < secaddr)
 	{
-	  (*inf->fprintf_func) (inf->stream, "-0x");
+	  (*inf->fprintf_styled_func) (inf->stream, dis_style_immediate,
+				       "-0x");
 	  objdump_print_value (secaddr - vma, inf, true);
 	}
       else if (vma > secaddr)
 	{
-	  (*inf->fprintf_func) (inf->stream, "+0x");
+	  (*inf->fprintf_styled_func) (inf->stream, dis_style_immediate, "+0x");
 	  objdump_print_value (vma - secaddr, inf, true);
 	}
-      (*inf->fprintf_func) (inf->stream, ">");
+      (*inf->fprintf_styled_func) (inf->stream, dis_style_text, ">");
     }
   else
     {
-      (*inf->fprintf_func) (inf->stream, "<");
+      (*inf->fprintf_styled_func) (inf->stream, dis_style_text, "<");
 
       objdump_print_symname (abfd, inf, sym);
 
@@ -1578,21 +1596,22 @@ objdump_print_addr_with_sym (bfd *abfd, asection *sec, asymbol *sym,
 	;
       else if (bfd_asymbol_value (sym) > vma)
 	{
-	  (*inf->fprintf_func) (inf->stream, "-0x");
+	  (*inf->fprintf_styled_func) (inf->stream, dis_style_immediate,"-0x");
 	  objdump_print_value (bfd_asymbol_value (sym) - vma, inf, true);
 	}
       else if (vma > bfd_asymbol_value (sym))
 	{
-	  (*inf->fprintf_func) (inf->stream, "+0x");
+	  (*inf->fprintf_styled_func) (inf->stream, dis_style_immediate, "+0x");
 	  objdump_print_value (vma - bfd_asymbol_value (sym), inf, true);
 	}
 
-      (*inf->fprintf_func) (inf->stream, ">");
+      (*inf->fprintf_styled_func) (inf->stream, dis_style_text, ">");
     }
 
   if (display_file_offsets)
-    inf->fprintf_func (inf->stream, _(" (File Offset: 0x%lx)"),
-			(long int)(sec->filepos + (vma - sec->vma)));
+    inf->fprintf_styled_func (inf->stream, dis_style_text,
+			      _(" (File Offset: 0x%lx)"),
+			      (long int)(sec->filepos + (vma - sec->vma)));
 }
 
 /* Print an address (VMA), symbolically if possible.
@@ -2119,6 +2138,139 @@ objdump_sprintf (SFILE *f, const char *format, ...)
   return n;
 }
 
+/* Return an integer greater than, or equal to zero, representing the color
+   for STYLE, or -1 if no color should be used.  */
+
+static int
+objdump_color_for_disassembler_style (enum disassembler_style style)
+{
+  int color = -1;
+
+  if (style == dis_style_comment_start)
+    disassembler_in_comment = true;
+
+  if (disassembler_color)
+    {
+      if (disassembler_in_comment)
+	return color;
+
+      switch (style)
+	{
+	case dis_style_symbol: color = 32; break;
+	case dis_style_mnemonic: color = 33; break;
+	case dis_style_register: color = 34; break;
+	case dis_style_address:
+	case dis_style_immediate: color = 35; break;
+	default:
+	case dis_style_text: color = -1; break;
+	}
+    }
+  else if (disassembler_extended_color)
+    {
+      if (disassembler_in_comment)
+	return 250;
+
+      switch (style)
+	{
+	case dis_style_symbol: color = 40; break;
+	case dis_style_mnemonic: color = 142; break;
+	case dis_style_register: color = 27; break;
+	case dis_style_address:
+	case dis_style_immediate: color = 134; break;
+	default:
+	case dis_style_text: color = -1; break;
+	}
+    }
+
+  return color;
+}
+
+/* Like objdump_sprintf, but add in escape sequences to highlight the
+   content according to STYLE.  */
+
+static int ATTRIBUTE_PRINTF_3
+objdump_styled_sprintf (SFILE *f, enum disassembler_style style,
+			const char *format, ...)
+{
+  size_t n;
+  va_list args;
+  int color = objdump_color_for_disassembler_style (style);
+
+  if (color >= 0)
+    {
+      while (1)
+	{
+	  size_t space = f->alloc - f->pos;
+
+	  if (disassembler_color)
+	    n = snprintf (f->buffer + f->pos, space, "\033[%dm", color);
+	  else
+	    n = snprintf (f->buffer + f->pos, space, "\033[38;5;%dm", color);
+	  if (space > n)
+	    break;
+
+	  f->alloc = (f->alloc + n) * 2;
+	  f->buffer = (char *) xrealloc (f->buffer, f->alloc);
+	}
+      f->pos += n;
+    }
+
+  while (1)
+    {
+      size_t space = f->alloc - f->pos;
+
+      va_start (args, format);
+      n = vsnprintf (f->buffer + f->pos, space, format, args);
+      va_end (args);
+
+      if (space > n)
+	break;
+
+      f->alloc = (f->alloc + n) * 2;
+      f->buffer = (char *) xrealloc (f->buffer, f->alloc);
+    }
+  f->pos += n;
+
+  if (color >= 0)
+    {
+      while (1)
+	{
+	  size_t space = f->alloc - f->pos;
+
+	  n = snprintf (f->buffer + f->pos, space, "\033[0m");
+
+	  if (space > n)
+	    break;
+
+	  f->alloc = (f->alloc + n) * 2;
+	  f->buffer = (char *) xrealloc (f->buffer, f->alloc);
+	}
+      f->pos += n;
+    }
+
+  return n;
+}
+
+/* We discard the styling information here.  This function is only used
+   when objdump is printing auxiliary information, the symbol headers, and
+   disassembly address, or the bytes of the disassembled instruction.  We
+   don't (currently) apply styling to any of this stuff, so, for now, just
+   print the content with no additional style added.  */
+
+static int ATTRIBUTE_PRINTF_3
+fprintf_styled (FILE *f, enum disassembler_style style ATTRIBUTE_UNUSED,
+		const char *fmt, ...)
+{
+  int res;
+  va_list ap;
+
+  va_start (ap, fmt);
+  res = vfprintf (f, fmt, ap);
+  va_end (ap);
+
+  return res;
+}
+
 /* Code for generating (colored) diagrams of control flow start and end
    points.  */
 
@@ -2550,8 +2702,8 @@ disassemble_jumps (struct disassemble_info * inf,
   sfile.pos = 0;
 
   inf->insn_info_valid = 0;
-  inf->fprintf_func = (fprintf_ftype) objdump_sprintf;
-  inf->stream = &sfile;
+  disassemble_set_printf (inf, &sfile, (fprintf_ftype) objdump_sprintf,
+			  (fprintf_styled_ftype) objdump_styled_sprintf);
 
   addr_offset = start_offset;
   while (addr_offset < stop_offset)
@@ -2613,6 +2765,7 @@ disassemble_jumps (struct disassemble_info * inf,
 
       /* Extract jump information.  */
       inf->insn_info_valid = 0;
+      disassembler_in_comment = false;
       octets = (*disassemble_fn) (section->vma + addr_offset, inf);
       /* Test if a jump was detected.  */
       if (inf->insn_info_valid
@@ -2633,9 +2786,8 @@ disassemble_jumps (struct disassemble_info * inf,
       addr_offset += octets / opb;
     }
 
-  inf->fprintf_func = (fprintf_ftype) fprintf;
-  inf->stream = stdout;
-
+  disassemble_set_printf (inf, (void *) stdout, (fprintf_ftype) fprintf,
+			  (fprintf_styled_ftype) fprintf_styled);
   free (sfile.buffer);
 
   /* Merge jumps.  */
@@ -2728,6 +2880,17 @@ null_print (const void * stream ATTRIBUTE_UNUSED, const char * format ATTRIBUTE_
   return 1;
 }
 
+/* Like null_print, but takes the extra STYLE argument.  As this is not
+   going to print anything, the extra argument is just ignored.  */
+
+static int
+null_styled_print (const void * stream ATTRIBUTE_UNUSED,
+		   enum disassembler_style style ATTRIBUTE_UNUSED,
+		   const char * format ATTRIBUTE_UNUSED, ...)
+{
+  return 1;
+}
+
 /* Print out jump visualization.  */
 
 static void
@@ -2938,8 +3101,9 @@ disassemble_bytes (struct disassemble_info *inf,
 	      int insn_size;
 
 	      sfile.pos = 0;
-	      inf->fprintf_func = (fprintf_ftype) objdump_sprintf;
-	      inf->stream = &sfile;
+	      disassemble_set_printf
+		(inf, &sfile, (fprintf_ftype) objdump_sprintf,
+		 (fprintf_styled_ftype) objdump_styled_sprintf);
 	      inf->bytes_per_line = 0;
 	      inf->bytes_per_chunk = 0;
 	      inf->flags = ((disassemble_all ? DISASSEMBLE_DATA : 0)
@@ -2981,10 +3145,15 @@ disassemble_bytes (struct disassemble_info *inf,
 			     twice, but we only do this when there is a high
 			     probability that there is a reloc that will
 			     affect the instruction.  */
-			  inf->fprintf_func = (fprintf_ftype) null_print;
+			  disassemble_set_printf
+			    (inf, inf->stream, (fprintf_ftype) null_print,
+			     (fprintf_styled_ftype) null_styled_print);
 			  insn_size = disassemble_fn (section->vma
 						      + addr_offset, inf);
-			  inf->fprintf_func = (fprintf_ftype) objdump_sprintf;
+			  disassemble_set_printf
+			    (inf, inf->stream,
+			     (fprintf_ftype) objdump_sprintf,
+			     (fprintf_styled_ftype) objdump_styled_sprintf);
 			}
 		    }
 
@@ -3009,12 +3178,13 @@ disassemble_bytes (struct disassemble_info *inf,
 		inf->stop_vma = section->vma + stop_offset;
 
 	      inf->stop_offset = stop_offset;
+	      disassembler_in_comment = false;
 	      insn_size = (*disassemble_fn) (section->vma + addr_offset, inf);
 	      octets = insn_size;
 
 	      inf->stop_vma = 0;
-	      inf->fprintf_func = (fprintf_ftype) fprintf;
-	      inf->stream = stdout;
+	      disassemble_set_printf (inf, stdout, (fprintf_ftype) fprintf,
+				      (fprintf_styled_ftype) fprintf_styled);
 	      if (insn_width == 0 && inf->bytes_per_line != 0)
 		octets_per_line = inf->bytes_per_line;
 	      if (insn_size < (int) opb)
@@ -3575,8 +3745,9 @@ disassemble_section (bfd *abfd, asection *section, void *inf)
 	      sf.alloc = strlen (sym->name) + 40;
 	      sf.buffer = (char*) xmalloc (sf.alloc);
 	      sf.pos = 0;
-	      di.fprintf_func = (fprintf_ftype) objdump_sprintf;
-	      di.stream = &sf;
+	      disassemble_set_printf
+		(&di, &sf, (fprintf_ftype) objdump_sprintf,
+		 (fprintf_styled_ftype) objdump_styled_sprintf);
 
 	      objdump_print_symname (abfd, &di, sym);
 
@@ -3645,8 +3816,8 @@ disassemble_data (bfd *abfd)
       ++sorted_symcount;
     }
 
-  init_disassemble_info (&disasm_info, stdout, (fprintf_ftype) fprintf);
-
+  init_disassemble_info (&disasm_info, stdout, (fprintf_ftype) fprintf,
+			 (fprintf_styled_ftype) fprintf_styled);
   disasm_info.application_data = (void *) &aux;
   aux.abfd = abfd;
   aux.require_sec = false;
@@ -5487,6 +5658,16 @@ main (int argc, char **argv)
 		nonfatal (_("unrecognized argument to --visualize-option"));
 	    }
 	  break;
+	case OPTION_DISASSEMBLER_COLOR:
+	  if (streq (optarg, "off"))
+	    disassembler_color = false;
+	  else if (streq (optarg, "color"))
+	    disassembler_color = true;
+	  else if (streq (optarg, "extended-color"))
+	    disassembler_extended_color = true;
+	  else
+	    nonfatal (_("unrecognized argument to --disassembler-color"));
+	  break;
 	case 'E':
 	  if (strcmp (optarg, "B") == 0)
 	    endian = BFD_ENDIAN_BIG;
diff --git a/gdb/disasm.c b/gdb/disasm.c
index b4cde801cb0..a1ab4d71cdd 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -177,6 +177,22 @@ gdb_disassembler::dis_asm_fprintf (void *stream, const char *format, ...)
   return 0;
 }
 
+/* See disasm.h.  */
+
+int
+gdb_disassembler::dis_asm_styled_fprintf (void *stream,
+					  enum disassembler_style style,
+					  const char *format, ...)
+{
+  va_list args;
+
+  va_start (args, format);
+  vfprintf_filtered ((struct ui_file *) stream, format, args);
+  va_end (args);
+  /* Something non -ve.  */
+  return 0;
+}
+
 static bool
 line_is_less_than (const deprecated_dis_line_entry &mle1,
 		   const deprecated_dis_line_entry &mle2)
@@ -787,7 +803,8 @@ gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
 	      && file->can_emit_style_escape ()),
     m_dest (file)
 {
-  init_disassemble_info (&m_di, &m_buffer, dis_asm_fprintf);
+  init_disassemble_info (&m_di, &m_buffer, dis_asm_fprintf,
+			 dis_asm_styled_fprintf);
   m_di.flavour = bfd_target_unknown_flavour;
   m_di.memory_error_func = dis_asm_memory_error;
   m_di.print_address_func = dis_asm_print_address;
@@ -954,12 +971,25 @@ gdb_disasm_null_printf (void *stream, const char *format, ...)
   return 0;
 }
 
+/* An fprintf-function for use by the disassembler when we know we don't
+   want to print anything, and the disassembler is using style.  Always
+   returns success.  */
+
+static int ATTRIBUTE_PRINTF (3, 4)
+gdb_disasm_null_styled_printf (void *stream,
+			       enum disassembler_style style,
+			       const char *format, ...)
+{
+  return 0;
+}
+
 /* See disasm.h.  */
 
 void
 init_disassemble_info_for_no_printing (struct disassemble_info *dinfo)
 {
-  init_disassemble_info (dinfo, nullptr, gdb_disasm_null_printf);
+  init_disassemble_info (dinfo, nullptr, gdb_disasm_null_printf,
+			 gdb_disasm_null_styled_printf);
 }
 
 /* Initialize a struct disassemble_info for gdb_buffered_insn_length.
diff --git a/gdb/disasm.h b/gdb/disasm.h
index 399afc5ae71..b71cd097a16 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -110,6 +110,13 @@ class gdb_disassembler
   static int dis_asm_fprintf (void *stream, const char *format, ...)
     ATTRIBUTE_PRINTF(2,3);
 
+  /* Print formatted message to STREAM, the content can be styled based on
+     STYLE if desired.  */
+  static int dis_asm_styled_fprintf (void *stream,
+				     enum disassembler_style style,
+				     const char *format, ...)
+    ATTRIBUTE_PRINTF(3,4);
+
   static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
 				  unsigned int len,
 				  struct disassemble_info *info);
diff --git a/include/dis-asm.h b/include/dis-asm.h
index 317592448a2..c80cd378817 100644
--- a/include/dis-asm.h
+++ b/include/dis-asm.h
@@ -35,8 +35,6 @@ extern "C" {
 #include <string.h>
 #include "bfd.h"
 
-  typedef int (*fprintf_ftype) (void *, const char*, ...) ATTRIBUTE_FPTR_PRINTF_2;
-
 enum dis_insn_type
 {
   dis_noninsn,			/* Not a valid instruction.  */
@@ -49,6 +47,50 @@ enum dis_insn_type
   dis_dref2			/* Two data references in instruction.  */
 };
 
+/* When printing styled disassembler output, this describes what style
+   should be used.  */
+
+enum disassembler_style
+{
+  /* This is the default style, use this for any additional syntax
+     (e.g. commas between operands, brackets, etc), or just as a default if
+     no other style seems appropriate.  */
+  dis_style_text,
+
+  /* Use this for all instruction mnemonics.  */
+  dis_style_mnemonic,
+
+  /* Use this for any register names.  */
+  dis_style_register,
+
+  /* Use this for any immediates that are _not_ addresses.  This would
+     include immediates that are going to be added to addresses,
+     e.g. branch offsets, or load offsets.  */
+  dis_style_immediate,
+
+  /* The style for the numerical representation of an absolute address.
+     Anything that is an address offset should use the immediate style.  */
+  dis_style_address,
+
+  /* The style for a symbol's name.  The numerical address of a symbol
+     should use the address style above, this style is reserved for the
+     name.  */
+  dis_style_symbol,
+
+  /* The start of a comment that runs to the end of the line.  Anything
+     printed after a comment start might be styled differently,
+     e.g. everything might be styled as a comment, regardless of the
+     actual style used.  The disassembler itself should not try to adjust
+     the style emitted for comment content, e.g. an address emitted within
+     a comment should still be given dis_style_address, in this way it is
+     up to the user of the disassembler to decide how comments should be
+     styled.  */
+  dis_style_comment_start
+};
+
+typedef int (*fprintf_ftype) (void *, const char*, ...) ATTRIBUTE_FPTR_PRINTF_2;
+typedef int (*fprintf_styled_ftype) (void *, enum disassembler_style, const char*, ...) ATTRIBUTE_FPTR_PRINTF_3;
+
 /* This struct is passed into the instruction decoding routine,
    and is passed back out into each callback.  The various fields are used
    for conveying information from your main routine into your callbacks,
@@ -62,6 +104,7 @@ enum dis_insn_type
 typedef struct disassemble_info
 {
   fprintf_ftype fprintf_func;
+  fprintf_styled_ftype fprintf_styled_func;
   void *stream;
   void *application_data;
 
@@ -228,6 +271,9 @@ typedef struct disassemble_info
      disassembling such as the way mapping symbols are found on AArch64.  */
   bfd_vma stop_offset;
 
+  /* Set to true if the disassembler applied styling to the output,
+     otherwise, set to false.  */
+  bool created_styled_output;
 } disassemble_info;
 
 /* This struct is used to pass information about valid disassembler
@@ -337,6 +383,10 @@ extern void disassemble_init_for_target (struct disassemble_info *);
 /* Tidy any memory allocated by targets, such as info->private_data.  */
 extern void disassemble_free_target (struct disassemble_info *);
 
+/* Set the basic disassembler print functions.  */
+extern void disassemble_set_printf (struct disassemble_info *, void *,
+				    fprintf_ftype, fprintf_styled_ftype);
+
 /* Document any target specific options available from the disassembler.  */
 extern void disassembler_usage (FILE *);
 
@@ -394,11 +444,13 @@ extern bool generic_symbol_is_valid
 /* Method to initialize a disassemble_info struct.  This should be
    called by all applications creating such a struct.  */
 extern void init_disassemble_info (struct disassemble_info *dinfo, void *stream,
-				   fprintf_ftype fprintf_func);
+				   fprintf_ftype fprintf_func,
+				   fprintf_styled_ftype fprintf_styled_func);
 
 /* For compatibility with existing code.  */
-#define INIT_DISASSEMBLE_INFO(INFO, STREAM, FPRINTF_FUNC) \
-  init_disassemble_info (&(INFO), (STREAM), (fprintf_ftype) (FPRINTF_FUNC))
+#define INIT_DISASSEMBLE_INFO(INFO, STREAM, FPRINTF_FUNC, FPRINTF_STYLED_FUNC)  \
+  init_disassemble_info (&(INFO), (STREAM), (fprintf_ftype) (FPRINTF_FUNC), \
+			 (fprintf_styled_ftype) (FPRINTF_STYLED_FUNC))
 
 #ifdef __cplusplus
 }
diff --git a/opcodes/dis-init.c b/opcodes/dis-init.c
index 0b171c0f36d..59039ef8b11 100644
--- a/opcodes/dis-init.c
+++ b/opcodes/dis-init.c
@@ -25,7 +25,8 @@
 
 void
 init_disassemble_info (struct disassemble_info *info, void *stream,
-		       fprintf_ftype fprintf_func)
+		       fprintf_ftype fprintf_func,
+		       fprintf_styled_ftype fprintf_styled_func)
 {
   memset (info, 0, sizeof (*info));
 
@@ -35,6 +36,7 @@ init_disassemble_info (struct disassemble_info *info, void *stream,
   info->endian_code = info->endian;
   info->octets_per_byte = 1;
   info->fprintf_func = fprintf_func;
+  info->fprintf_styled_func = fprintf_styled_func;
   info->stream = stream;
   info->read_memory_func = buffer_read_memory;
   info->memory_error_func = perror_memory;
@@ -42,5 +44,6 @@ init_disassemble_info (struct disassemble_info *info, void *stream,
   info->symbol_at_address_func = generic_symbol_at_address;
   info->symbol_is_valid = generic_symbol_is_valid;
   info->display_endian = BFD_ENDIAN_UNKNOWN;
+  info->created_styled_output = false;
 }
 
diff --git a/opcodes/disassemble.c b/opcodes/disassemble.c
index e613935eff8..15df49ea331 100644
--- a/opcodes/disassemble.c
+++ b/opcodes/disassemble.c
@@ -860,3 +860,16 @@ opcodes_assert (const char *file, int line)
   opcodes_error_handler (_("Please report this bug"));
   abort ();
 }
+
+/* Set the stream, and the styled and unstyled printf functions within
+   INFO.  */
+
+void
+disassemble_set_printf (struct disassemble_info *info, void *stream,
+			fprintf_ftype unstyled_printf,
+			fprintf_styled_ftype styled_printf)
+{
+  info->stream = stream;
+  info->fprintf_func = unstyled_printf;
+  info->fprintf_styled_func = styled_printf;
+}
-- 
2.25.4


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/3] opcodes/riscv: implement style support in the disassembler
  2022-02-16 20:53 [PATCH 0/3] disassembler syntax highlighting in objdump (via libopcodes) Andrew Burgess
  2022-02-16 20:53 ` [PATCH 1/3] objdump/opcodes: add syntax highlighting to disassembler output Andrew Burgess
@ 2022-02-16 20:53 ` Andrew Burgess
  2022-02-19 10:24   ` Andrew Burgess
  2022-02-16 20:53 ` [PATCH 3/3] opcodes/i386: partially implement disassembler style support Andrew Burgess
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 21+ messages in thread
From: Andrew Burgess @ 2022-02-16 20:53 UTC (permalink / raw)
  To: binutils; +Cc: Andrew Burgess

Update the RISC-V disassembler to supply style information.  This
allows objdump to apply syntax highlighting to the disassembler
output (when the appropriate command line flag is used).

Ignoring colours, there should be no other user visible changes in the
output of the disassembler in either objdump or gdb.

opcodes/ChangeLog:

	* disassembler.c (disassemble_init_for_target): Set
	created_styled_output for riscv.
	* riscv-dis.c: Changed throughout to use fprintf_styled_func
	instead of fprintf_func.
---
 opcodes/disassemble.c |   1 +
 opcodes/riscv-dis.c   | 147 ++++++++++++++++++++++--------------------
 2 files changed, 78 insertions(+), 70 deletions(-)

diff --git a/opcodes/disassemble.c b/opcodes/disassemble.c
index 15df49ea331..b8eed9695ea 100644
--- a/opcodes/disassemble.c
+++ b/opcodes/disassemble.c
@@ -708,6 +708,7 @@ disassemble_init_for_target (struct disassemble_info * info)
 #ifdef ARCH_riscv
     case bfd_arch_riscv:
       info->symbol_is_valid = riscv_symbol_is_valid;
+      info->created_styled_output = true;
       break;
 #endif
 #ifdef ARCH_wasm32
diff --git a/opcodes/riscv-dis.c b/opcodes/riscv-dis.c
index 34724d4aec5..90475ccc011 100644
--- a/opcodes/riscv-dis.c
+++ b/opcodes/riscv-dis.c
@@ -165,7 +165,7 @@ arg_print (struct disassemble_info *info, unsigned long val,
 	   const char* const* array, size_t size)
 {
   const char *s = val >= size || array[val] == NULL ? "unknown" : array[val];
-  (*info->fprintf_func) (info->stream, "%s", s);
+  (*info->fprintf_styled_func) (info->stream, dis_style_text, "%s", s);
 }
 
 static void
@@ -195,11 +195,11 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
   struct riscv_private_data *pd = info->private_data;
   int rs1 = (l >> OP_SH_RS1) & OP_MASK_RS1;
   int rd = (l >> OP_SH_RD) & OP_MASK_RD;
-  fprintf_ftype print = info->fprintf_func;
+  fprintf_styled_ftype print = info->fprintf_styled_func;
   const char *opargStart;
 
   if (*oparg != '\0')
-    print (info->stream, "\t");
+    print (info->stream, dis_style_text, "\t");
 
   for (; *oparg != '\0'; oparg++)
     {
@@ -211,22 +211,22 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	    {
 	    case 's': /* RS1 x8-x15.  */
 	    case 'w': /* RS1 x8-x15.  */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_gpr_names[EXTRACT_OPERAND (CRS1S, l) + 8]);
 	      break;
 	    case 't': /* RS2 x8-x15.  */
 	    case 'x': /* RS2 x8-x15.  */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_gpr_names[EXTRACT_OPERAND (CRS2S, l) + 8]);
 	      break;
 	    case 'U': /* RS1, constrained to equal RD.  */
-	      print (info->stream, "%s", riscv_gpr_names[rd]);
+	      print (info->stream, dis_style_register, "%s", riscv_gpr_names[rd]);
 	      break;
 	    case 'c': /* RS1, constrained to equal sp.  */
-	      print (info->stream, "%s", riscv_gpr_names[X_SP]);
+	      print (info->stream, dis_style_register, "%s", riscv_gpr_names[X_SP]);
 	      break;
 	    case 'V': /* RS2 */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_gpr_names[EXTRACT_OPERAND (CRS2, l)]);
 	      break;
 	    case 'o':
@@ -236,31 +236,31 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	      if (info->mach == bfd_mach_riscv64
 		  && ((l & MASK_C_ADDIW) == MATCH_C_ADDIW) && rd != 0)
 		maybe_print_address (pd, rd, EXTRACT_CITYPE_IMM (l), 1);
-	      print (info->stream, "%d", (int)EXTRACT_CITYPE_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CITYPE_IMM (l));
 	      break;
 	    case 'k':
-	      print (info->stream, "%d", (int)EXTRACT_CLTYPE_LW_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CLTYPE_LW_IMM (l));
 	      break;
 	    case 'l':
-	      print (info->stream, "%d", (int)EXTRACT_CLTYPE_LD_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CLTYPE_LD_IMM (l));
 	      break;
 	    case 'm':
-	      print (info->stream, "%d", (int)EXTRACT_CITYPE_LWSP_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CITYPE_LWSP_IMM (l));
 	      break;
 	    case 'n':
-	      print (info->stream, "%d", (int)EXTRACT_CITYPE_LDSP_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CITYPE_LDSP_IMM (l));
 	      break;
 	    case 'K':
-	      print (info->stream, "%d", (int)EXTRACT_CIWTYPE_ADDI4SPN_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CIWTYPE_ADDI4SPN_IMM (l));
 	      break;
 	    case 'L':
-	      print (info->stream, "%d", (int)EXTRACT_CITYPE_ADDI16SP_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CITYPE_ADDI16SP_IMM (l));
 	      break;
 	    case 'M':
-	      print (info->stream, "%d", (int)EXTRACT_CSSTYPE_SWSP_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CSSTYPE_SWSP_IMM (l));
 	      break;
 	    case 'N':
-	      print (info->stream, "%d", (int)EXTRACT_CSSTYPE_SDSP_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CSSTYPE_SDSP_IMM (l));
 	      break;
 	    case 'p':
 	      info->target = EXTRACT_CBTYPE_IMM (l) + pc;
@@ -271,21 +271,21 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	      (*info->print_address_func) (info->target, info);
 	      break;
 	    case 'u':
-	      print (info->stream, "0x%x",
+	      print (info->stream, dis_style_immediate, "0x%x",
 		     (int)(EXTRACT_CITYPE_IMM (l) & (RISCV_BIGIMM_REACH-1)));
 	      break;
 	    case '>':
-	      print (info->stream, "0x%x", (int)EXTRACT_CITYPE_IMM (l) & 0x3f);
+	      print (info->stream, dis_style_immediate, "0x%x", (int)EXTRACT_CITYPE_IMM (l) & 0x3f);
 	      break;
 	    case '<':
-	      print (info->stream, "0x%x", (int)EXTRACT_CITYPE_IMM (l) & 0x1f);
+	      print (info->stream, dis_style_immediate, "0x%x", (int)EXTRACT_CITYPE_IMM (l) & 0x1f);
 	      break;
 	    case 'T': /* Floating-point RS2.  */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_fpr_names[EXTRACT_OPERAND (CRS2, l)]);
 	      break;
 	    case 'D': /* Floating-point RS2 x8-x15.  */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_fpr_names[EXTRACT_OPERAND (CRS2S, l) + 8]);
 	      break;
 	    }
@@ -296,28 +296,28 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	    {
 	    case 'd':
 	    case 'f':
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_vecr_names_numeric[EXTRACT_OPERAND (VD, l)]);
 	      break;
 	    case 'e':
 	      if (!EXTRACT_OPERAND (VWD, l))
-		print (info->stream, "%s", riscv_gpr_names[0]);
+		print (info->stream, dis_style_register, "%s", riscv_gpr_names[0]);
 	      else
-		print (info->stream, "%s",
+		print (info->stream, dis_style_register, "%s",
 		       riscv_vecr_names_numeric[EXTRACT_OPERAND (VD, l)]);
 	      break;
 	    case 's':
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_vecr_names_numeric[EXTRACT_OPERAND (VS1, l)]);
 	      break;
 	    case 't':
 	    case 'u': /* VS1 == VS2 already verified at this point.  */
 	    case 'v': /* VD == VS1 == VS2 already verified at this point.  */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_vecr_names_numeric[EXTRACT_OPERAND (VS2, l)]);
 	      break;
 	    case '0':
-	      print (info->stream, "%s", riscv_vecr_names_numeric[0]);
+	      print (info->stream, dis_style_register, "%s", riscv_vecr_names_numeric[0]);
 	      break;
 	    case 'b':
 	    case 'c':
@@ -337,25 +337,26 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 		    && !imm_vtype_res
 		    && riscv_vsew[imm_vsew] != NULL
 		    && riscv_vlmul[imm_vlmul] != NULL)
-		  print (info->stream, "%s,%s,%s,%s", riscv_vsew[imm_vsew],
+		  print (info->stream, dis_style_text, "%s,%s,%s,%s",
+			 riscv_vsew[imm_vsew],
 			 riscv_vlmul[imm_vlmul], riscv_vta[imm_vta],
 			 riscv_vma[imm_vma]);
 		else
-		  print (info->stream, "%d", imm);
+		  print (info->stream, dis_style_immediate, "%d", imm);
 	      }
 	      break;
 	    case 'i':
-	      print (info->stream, "%d", (int)EXTRACT_RVV_VI_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_RVV_VI_IMM (l));
 	      break;
 	    case 'j':
-	      print (info->stream, "%d", (int)EXTRACT_RVV_VI_UIMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_RVV_VI_UIMM (l));
 	      break;
 	    case 'k':
-	      print (info->stream, "%d", (int)EXTRACT_RVV_OFFSET (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_RVV_OFFSET (l));
 	      break;
 	    case 'm':
 	      if (! EXTRACT_OPERAND (VMASK, l))
-		print (info->stream, ",%s", riscv_vecm_names_numeric[0]);
+		print (info->stream, dis_style_register, ",%s", riscv_vecm_names_numeric[0]);
 	      break;
 	    }
 	  break;
@@ -365,44 +366,44 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	case ')':
 	case '[':
 	case ']':
-	  print (info->stream, "%c", *oparg);
+	  print (info->stream, dis_style_text, "%c", *oparg);
 	  break;
 
 	case '0':
 	  /* Only print constant 0 if it is the last argument.  */
 	  if (!oparg[1])
-	    print (info->stream, "0");
+	    print (info->stream, dis_style_immediate, "0");
 	  break;
 
 	case 'b':
 	case 's':
 	  if ((l & MASK_JALR) == MATCH_JALR)
 	    maybe_print_address (pd, rs1, 0, 0);
-	  print (info->stream, "%s", riscv_gpr_names[rs1]);
+	  print (info->stream, dis_style_register, "%s", riscv_gpr_names[rs1]);
 	  break;
 
 	case 't':
-	  print (info->stream, "%s",
+	  print (info->stream, dis_style_register, "%s",
 		 riscv_gpr_names[EXTRACT_OPERAND (RS2, l)]);
 	  break;
 
 	case 'u':
-	  print (info->stream, "0x%x",
+	  print (info->stream, dis_style_immediate, "0x%x",
 		 (unsigned)EXTRACT_UTYPE_IMM (l) >> RISCV_IMM_BITS);
 	  break;
 
 	case 'm':
-	  arg_print (info, EXTRACT_OPERAND (RM, l),
+	  arg_print (info->stream, EXTRACT_OPERAND (RM, l),
 		     riscv_rm, ARRAY_SIZE (riscv_rm));
 	  break;
 
 	case 'P':
-	  arg_print (info, EXTRACT_OPERAND (PRED, l),
+	  arg_print (info->stream, EXTRACT_OPERAND (PRED, l),
 		     riscv_pred_succ, ARRAY_SIZE (riscv_pred_succ));
 	  break;
 
 	case 'Q':
-	  arg_print (info, EXTRACT_OPERAND (SUCC, l),
+	  arg_print (info->stream, EXTRACT_OPERAND (SUCC, l),
 		     riscv_pred_succ, ARRAY_SIZE (riscv_pred_succ));
 	  break;
 
@@ -416,12 +417,12 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	  if (info->mach == bfd_mach_riscv64
 	      && ((l & MASK_ADDIW) == MATCH_ADDIW) && rs1 != 0)
 	    maybe_print_address (pd, rs1, EXTRACT_ITYPE_IMM (l), 1);
-	  print (info->stream, "%d", (int)EXTRACT_ITYPE_IMM (l));
+	  print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_ITYPE_IMM (l));
 	  break;
 
 	case 'q':
 	  maybe_print_address (pd, rs1, EXTRACT_STYPE_IMM (l), 0);
-	  print (info->stream, "%d", (int)EXTRACT_STYPE_IMM (l));
+	  print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_STYPE_IMM (l));
 	  break;
 
 	case 'a':
@@ -441,40 +442,40 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	    pd->hi_addr[rd] = EXTRACT_UTYPE_IMM (l);
 	  else if ((l & MASK_C_LUI) == MATCH_C_LUI)
 	    pd->hi_addr[rd] = EXTRACT_CITYPE_LUI_IMM (l);
-	  print (info->stream, "%s", riscv_gpr_names[rd]);
+	  print (info->stream, dis_style_register, "%s", riscv_gpr_names[rd]);
 	  break;
 
 	case 'y':
-	  print (info->stream, "0x%x", (int)EXTRACT_OPERAND (BS, l));
+	  print (info->stream, dis_style_text, "0x%x", (int)EXTRACT_OPERAND (BS, l));
 	  break;
 
 	case 'z':
-	  print (info->stream, "%s", riscv_gpr_names[0]);
+	  print (info->stream, dis_style_register, "%s", riscv_gpr_names[0]);
 	  break;
 
 	case '>':
-	  print (info->stream, "0x%x", (int)EXTRACT_OPERAND (SHAMT, l));
+	  print (info->stream, dis_style_immediate, "0x%x", (int)EXTRACT_OPERAND (SHAMT, l));
 	  break;
 
 	case '<':
-	  print (info->stream, "0x%x", (int)EXTRACT_OPERAND (SHAMTW, l));
+	  print (info->stream, dis_style_text, "0x%x", (int)EXTRACT_OPERAND (SHAMTW, l));
 	  break;
 
 	case 'S':
 	case 'U':
-	  print (info->stream, "%s", riscv_fpr_names[rs1]);
+	  print (info->stream, dis_style_text, "%s", riscv_fpr_names[rs1]);
 	  break;
 
 	case 'T':
-	  print (info->stream, "%s", riscv_fpr_names[EXTRACT_OPERAND (RS2, l)]);
+	  print (info->stream, dis_style_text, "%s", riscv_fpr_names[EXTRACT_OPERAND (RS2, l)]);
 	  break;
 
 	case 'D':
-	  print (info->stream, "%s", riscv_fpr_names[rd]);
+	  print (info->stream, dis_style_text, "%s", riscv_fpr_names[rd]);
 	  break;
 
 	case 'R':
-	  print (info->stream, "%s", riscv_fpr_names[EXTRACT_OPERAND (RS3, l)]);
+	  print (info->stream, dis_style_text, "%s", riscv_fpr_names[EXTRACT_OPERAND (RS3, l)]);
 	  break;
 
 	case 'E':
@@ -507,23 +508,23 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	      }
 
 	    if (riscv_csr_hash[csr] != NULL)
-	      print (info->stream, "%s", riscv_csr_hash[csr]);
+	      print (info->stream, dis_style_text, "%s", riscv_csr_hash[csr]);
 	    else
-	      print (info->stream, "0x%x", csr);
+	      print (info->stream, dis_style_text, "0x%x", csr);
 	    break;
 	  }
 
 	case 'Y':
-	  print (info->stream, "0x%x", (int)EXTRACT_OPERAND (RNUM, l));
+	  print (info->stream, dis_style_text, "0x%x", (int)EXTRACT_OPERAND (RNUM, l));
 	  break;
 
 	case 'Z':
-	  print (info->stream, "%d", rs1);
+	  print (info->stream, dis_style_text, "%d", rs1);
 	  break;
 
 	default:
 	  /* xgettext:c-format */
-	  print (info->stream, _("# internal error, undefined modifier (%c)"),
+	  print (info->stream, dis_style_text, _("# internal error, undefined modifier (%c)"),
 		 *opargStart);
 	  return;
 	}
@@ -623,14 +624,15 @@ riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
 	    continue;
 
 	  /* It's a match.  */
-	  (*info->fprintf_func) (info->stream, "%s", op->name);
+	  (*info->fprintf_styled_func) (info->stream, dis_style_mnemonic, "%s", op->name);
 	  print_insn_args (op->args, word, memaddr, info);
 
 	  /* Try to disassemble multi-instruction addressing sequences.  */
 	  if (pd->print_addr != (bfd_vma)-1)
 	    {
 	      info->target = pd->print_addr;
-	      (*info->fprintf_func) (info->stream, " # ");
+	      (*info->fprintf_styled_func)
+		(info->stream, dis_style_comment_start, " # ");
 	      (*info->print_address_func) (info->target, info);
 	      pd->print_addr = -1;
 	    }
@@ -672,19 +674,24 @@ riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
     case 2:
     case 4:
     case 8:
-      (*info->fprintf_func) (info->stream, ".%dbyte\t0x%llx",
-                             insnlen, (unsigned long long) word);
+      (*info->fprintf_styled_func) (info->stream, dis_style_mnemonic,
+				    ".%dbyte\t", insnlen);
+      (*info->fprintf_styled_func) (info->stream, dis_style_immediate,
+				    "0x%llx", (unsigned long long) word);
       break;
     default:
       {
         int i;
-        (*info->fprintf_func) (info->stream, ".byte\t");
+	(*info->fprintf_styled_func) (info->stream, dis_style_mnemonic,
+				      ".byte\t");
         for (i = 0; i < insnlen; ++i)
           {
             if (i > 0)
-              (*info->fprintf_func) (info->stream, ", ");
-            (*info->fprintf_func) (info->stream, "0x%02x",
-                                   (unsigned int) (word & 0xff));
+	      (*info->fprintf_styled_func) (info->stream, dis_style_text,
+					    ", ");
+	    (*info->fprintf_styled_func) (info->stream, dis_style_immediate,
+					  "0x%02x",
+					  (unsigned int) (word & 0xff));
             word >>= 8;
           }
       }
@@ -863,22 +870,22 @@ riscv_disassemble_data (bfd_vma memaddr ATTRIBUTE_UNUSED,
     {
     case 1:
       info->bytes_per_line = 6;
-      (*info->fprintf_func) (info->stream, ".byte\t0x%02llx",
+      (*info->fprintf_styled_func) (info->stream, dis_style_text, ".byte\t0x%02llx",
 			     (unsigned long long) data);
       break;
     case 2:
       info->bytes_per_line = 8;
-      (*info->fprintf_func) (info->stream, ".short\t0x%04llx",
+      (*info->fprintf_styled_func) (info->stream, dis_style_text, ".short\t0x%04llx",
 			     (unsigned long long) data);
       break;
     case 4:
       info->bytes_per_line = 8;
-      (*info->fprintf_func) (info->stream, ".word\t0x%08llx",
+      (*info->fprintf_styled_func) (info->stream, dis_style_text, ".word\t0x%08llx",
 			     (unsigned long long) data);
       break;
     case 8:
       info->bytes_per_line = 8;
-      (*info->fprintf_func) (info->stream, ".dword\t0x%016llx",
+      (*info->fprintf_styled_func) (info->stream, dis_style_text, ".dword\t0x%016llx",
 			     (unsigned long long) data);
       break;
     default:
-- 
2.25.4


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 3/3] opcodes/i386: partially implement disassembler style support
  2022-02-16 20:53 [PATCH 0/3] disassembler syntax highlighting in objdump (via libopcodes) Andrew Burgess
  2022-02-16 20:53 ` [PATCH 1/3] objdump/opcodes: add syntax highlighting to disassembler output Andrew Burgess
  2022-02-16 20:53 ` [PATCH 2/3] opcodes/riscv: implement style support in the disassembler Andrew Burgess
@ 2022-02-16 20:53 ` Andrew Burgess
  2022-02-17  9:35   ` Jan Beulich
  2022-02-17  3:57 ` [PATCH 0/3] disassembler syntax highlighting in objdump (via libopcodes) Nelson Chu
  2022-03-21 14:33 ` [PATCHv2 " Andrew Burgess
  4 siblings, 1 reply; 21+ messages in thread
From: Andrew Burgess @ 2022-02-16 20:53 UTC (permalink / raw)
  To: binutils; +Cc: Andrew Burgess

This commit adds partial support for disassembler styling in the i386
disassembler.

The i386 disassembler collects the instruction arguments into an array
of strings, and then loops over the array printing the arguments out
later on.  The problem is that by the time we print the arguments out
it's not obvious what the type of each argument is.

Obviously this can be fixed, but I'd like to not do that as part of
this commit, rather, I'd prefer to keep this commit as small as
possible to get the basic infrastructure in place, then we can improve
on this, to add additional styling, in later commits.

For now then, I think this commit should correctly style mnemonics,
some immediates, and comments.  Everything else will be printed as
plain text, which will include most instruction arguments, unless the
argument is printed as a symbol, by calling the print_address_func
callback.

Ignoring colours, there should be no other user visible changes in the
output of the disassembler in either objdump or gdb.

opcodes/ChangeLog:

	* disassembler.c (disassemble_init_for_target): Set
	created_styled_output for i386 based targets.
	* i386-dis.c: Changed throughout to use fprintf_styled_func
	instead of fprintf_func.
---
 opcodes/disassemble.c |  9 +++++-
 opcodes/i386-dis.c    | 71 +++++++++++++++++++++++++++++--------------
 2 files changed, 56 insertions(+), 24 deletions(-)

diff --git a/opcodes/disassemble.c b/opcodes/disassemble.c
index b8eed9695ea..81a90f20a9c 100644
--- a/opcodes/disassemble.c
+++ b/opcodes/disassemble.c
@@ -632,7 +632,14 @@ disassemble_init_for_target (struct disassemble_info * info)
       info->disassembler_needs_relocs = true;
       break;
 #endif
-
+#ifdef ARCH_i386
+    case bfd_arch_i386:
+    case bfd_arch_iamcu:
+    case bfd_arch_l1om:
+    case bfd_arch_k1om:
+      info->created_styled_output = true;
+      break;
+#endif
 #ifdef ARCH_ia64
     case bfd_arch_ia64:
       info->skip_zeroes = 16;
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index a30bda0633b..f9a2e3d6357 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -9402,8 +9402,8 @@ print_insn (bfd_vma pc, instr_info *ins)
 
   if (ins->address_mode == mode_64bit && sizeof (bfd_vma) < 8)
     {
-      (*ins->info->fprintf_func) (ins->info->stream,
-			     _("64-bit address is disabled"));
+      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
+					 _("64-bit address is disabled"));
       return -1;
     }
 
@@ -9456,12 +9456,18 @@ print_insn (bfd_vma pc, instr_info *ins)
 	{
 	  name = prefix_name (ins, priv.the_buffer[0], priv.orig_sizeflag);
 	  if (name != NULL)
-	    (*ins->info->fprintf_func) (ins->info->stream, "%s", name);
+	    (*ins->info->fprintf_styled_func)
+	      (ins->info->stream, dis_style_mnemonic, "%s", name);
 	  else
 	    {
 	      /* Just print the first byte as a .byte instruction.  */
-	      (*ins->info->fprintf_func) (ins->info->stream, ".byte 0x%x",
-				     (unsigned int) priv.the_buffer[0]);
+	      (*ins->info->fprintf_styled_func)
+		(ins->info->stream, dis_style_mnemonic, ".byte");
+	      (*ins->info->fprintf_styled_func)
+		(ins->info->stream, dis_style_text, " ");
+	      (*ins->info->fprintf_styled_func)
+		(ins->info->stream, dis_style_immediate, "0x%x",
+		 (unsigned int) priv.the_buffer[0]);
 	    }
 
 	  return 1;
@@ -9479,10 +9485,10 @@ print_insn (bfd_vma pc, instr_info *ins)
       for (i = 0;
 	   i < (int) ARRAY_SIZE (ins->all_prefixes) && ins->all_prefixes[i];
 	   i++)
-	(*ins->info->fprintf_func) (ins->info->stream, "%s%s",
-			       i == 0 ? "" : " ",
-			       prefix_name (ins, ins->all_prefixes[i],
-					    sizeflag));
+	(*ins->info->fprintf_styled_func)
+	  (ins->info->stream, dis_style_mnemonic, "%s%s",
+	   (i == 0 ? "" : " "), prefix_name (ins, ins->all_prefixes[i],
+					     sizeflag));
       return i;
     }
 
@@ -9497,10 +9503,15 @@ print_insn (bfd_vma pc, instr_info *ins)
       /* Handle ins->prefixes before fwait.  */
       for (i = 0; i < ins->fwait_prefix && ins->all_prefixes[i];
 	   i++)
-	(*ins->info->fprintf_func) (ins->info->stream, "%s ",
-				    prefix_name (ins, ins->all_prefixes[i],
-						 sizeflag));
-      (*ins->info->fprintf_func) (ins->info->stream, "fwait");
+	{
+	  (*ins->info->fprintf_styled_func)
+	    (ins->info->stream, dis_style_mnemonic, "%s",
+	     prefix_name (ins, ins->all_prefixes[i], sizeflag));
+	  (*ins->info->fprintf_styled_func)
+	    (ins->info->stream, dis_style_mnemonic, " ");
+	}
+      (*ins->info->fprintf_styled_func)
+	(ins->info->stream, dis_style_mnemonic, "fwait");
       return i + 1;
     }
 
@@ -9649,14 +9660,16 @@ print_insn (bfd_vma pc, instr_info *ins)
      are all 0s in inverted form.  */
   if (ins->need_vex && ins->vex.register_specifier != 0)
     {
-      (*ins->info->fprintf_func) (ins->info->stream, "(bad)");
+      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
+					 "(bad)");
       return ins->end_codep - priv.the_buffer;
     }
 
   /* If EVEX.z is set, there must be an actual mask register in use.  */
   if (ins->vex.zeroing && ins->vex.mask_register_specifier == 0)
     {
-      (*ins->info->fprintf_func) (ins->info->stream, "(bad)");
+      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
+					 "(bad)");
       return ins->end_codep - priv.the_buffer;
     }
 
@@ -9667,7 +9680,8 @@ print_insn (bfd_vma pc, instr_info *ins)
 	 the encoding invalid.  Most other PREFIX_OPCODE rules still apply.  */
       if (ins->need_vex ? !ins->vex.prefix : !(ins->prefixes & PREFIX_DATA))
 	{
-	  (*ins->info->fprintf_func) (ins->info->stream, "(bad)");
+	  (*ins->info->fprintf_styled_func) (ins->info->stream,
+					     dis_style_text, "(bad)");
 	  return ins->end_codep - priv.the_buffer;
 	}
       ins->used_prefixes |= PREFIX_DATA;
@@ -9694,7 +9708,8 @@ print_insn (bfd_vma pc, instr_info *ins)
 	  || (ins->vex.evex && dp->prefix_requirement != PREFIX_DATA
 	      && !ins->vex.w != !(ins->used_prefixes & PREFIX_DATA)))
 	{
-	  (*ins->info->fprintf_func) (ins->info->stream, "(bad)");
+	  (*ins->info->fprintf_styled_func) (ins->info->stream,
+					     dis_style_text, "(bad)");
 	  return ins->end_codep - priv.the_buffer;
 	}
       break;
@@ -9744,13 +9759,17 @@ print_insn (bfd_vma pc, instr_info *ins)
 	if (name == NULL)
 	  abort ();
 	prefix_length += strlen (name) + 1;
-	(*ins->info->fprintf_func) (ins->info->stream, "%s ", name);
+	(*ins->info->fprintf_styled_func)
+	  (ins->info->stream, dis_style_mnemonic, "%s", name);
+	(*ins->info->fprintf_styled_func)
+	  (ins->info->stream, dis_style_text, " ");
       }
 
   /* Check maximum code length.  */
   if ((ins->codep - ins->start_codep) > MAX_CODE_LENGTH)
     {
-      (*ins->info->fprintf_func) (ins->info->stream, "(bad)");
+      (*ins->info->fprintf_styled_func)
+	(ins->info->stream, dis_style_text, "(bad)");
       return MAX_CODE_LENGTH;
     }
 
@@ -9758,7 +9777,8 @@ print_insn (bfd_vma pc, instr_info *ins)
   for (i = strlen (ins->obuf) + prefix_length; i < 6; i++)
     oappend (ins, " ");
   oappend (ins, " ");
-  (*ins->info->fprintf_func) (ins->info->stream, "%s", ins->obuf);
+  (*ins->info->fprintf_styled_func)
+    (ins->info->stream, dis_style_mnemonic, "%s", ins->obuf);
 
   /* The enter and bound instructions are printed with operands in the same
      order as the intel book; everything else is printed in reverse order.  */
@@ -9797,7 +9817,8 @@ print_insn (bfd_vma pc, instr_info *ins)
     if (*op_txt[i])
       {
 	if (needcomma)
-	  (*ins->info->fprintf_func) (ins->info->stream, ",");
+	  (*ins->info->fprintf_styled_func) (ins->info->stream,
+					     dis_style_text, ",");
 	if (ins->op_index[i] != -1 && !ins->op_riprel[i])
 	  {
 	    bfd_vma target = (bfd_vma) ins->op_address[ins->op_index[i]];
@@ -9813,14 +9834,18 @@ print_insn (bfd_vma pc, instr_info *ins)
 	    (*ins->info->print_address_func) (target, ins->info);
 	  }
 	else
-	  (*ins->info->fprintf_func) (ins->info->stream, "%s", op_txt[i]);
+	  (*ins->info->fprintf_styled_func) (ins->info->stream,
+					     dis_style_text, "%s",
+					     op_txt[i]);
 	needcomma = 1;
       }
 
   for (i = 0; i < MAX_OPERANDS; i++)
     if (ins->op_index[i] != -1 && ins->op_riprel[i])
       {
-	(*ins->info->fprintf_func) (ins->info->stream, "        # ");
+	(*ins->info->fprintf_styled_func) (ins->info->stream,
+					   dis_style_comment_start,
+					   "        # ");
 	(*ins->info->print_address_func) ((bfd_vma)
 			(ins->start_pc + (ins->codep - ins->start_codep)
 			 + ins->op_address[ins->op_index[i]]), ins->info);
-- 
2.25.4


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/3] disassembler syntax highlighting in objdump (via libopcodes)
  2022-02-16 20:53 [PATCH 0/3] disassembler syntax highlighting in objdump (via libopcodes) Andrew Burgess
                   ` (2 preceding siblings ...)
  2022-02-16 20:53 ` [PATCH 3/3] opcodes/i386: partially implement disassembler style support Andrew Burgess
@ 2022-02-17  3:57 ` Nelson Chu
  2022-02-17 16:17   ` Andrew Burgess
  2022-03-21 14:33 ` [PATCHv2 " Andrew Burgess
  4 siblings, 1 reply; 21+ messages in thread
From: Nelson Chu @ 2022-02-17  3:57 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: Binutils

Hi Andrew,

I get gas/testsuite/gas/riscv/insn testcase fail after applying the
series of patches.  Here is the reduced case,

$ cat tmp.s
.word 0x68c58543
fmadd.s fa0,fa1,fa2,fa3,rne
.insn r  MADD, 0, 0, a0, a1, a2, a3
.insn 0x68c58543

All these are the same instruction encoding - fmadd.s, but the first
".word" will be marked and dumped as data rather than instruction.  We
are used to get the following dump result,

$ /scratch/nelsonc/build-upstream/rv64gc-elf/build-install/bin/riscv64-unknown-elf-objdump
-d tmp.o

0000000000000000 <.text>:
   0:   68c58543                .word   0x68c58543
   4:   68c58543                fmadd.s fa0,fa1,fa2,fa3,rne
   8:   68c58543                fmadd.s fa0,fa1,fa2,fa3,rne
   c:   68c58543                fmadd.s fa0,fa1,fa2,fa3,rne

But now with the patches, I will get segmentation fault when trying to
dump fmadd.s as instruction,

$ /scratch/nelsonc/build-upstream/rv64gc-elf/build-install/bin/riscv64-unknown-elf-objdump
-d tmp.o

0000000000000000 <.text>:
   0:   68c58543                .word   0x68c58543
Segmentation fault (core dumped)

Is it possible if you could help to see what happened?  Thanks!


Nelson

On Thu, Feb 17, 2022 at 4:53 AM Andrew Burgess via Binutils
<binutils@sourceware.org> wrote:
>
> This series is a serious attempt at what I discussed here:
>
>   https://sourceware.org/pipermail/binutils/2021-December/118806.html
>
> This series changes libopcodes so that this disassemblers can supply
> styling information with every piece of disassembly output, e.g. is
> this a register?  an address?  a mnemonic?  etc.
>
> Users of the disassembler can then choose to make use of this
> information to add styling to the disassembler output.
>
> And that is what I do for objdump in this series.  The styling is off
> by default, but can be turned on with a new command line flag:
>     --disassembler-color=off|color|extended-color
>
> I've updated GDB enough to keep it building and running after this
> change, though at this point GDB doesn't make use of the new styling
> information, that will come later.
>
> All feedback would be welcome.
>
> Thanks,
> Andrew
>
> ---
>
> Andrew Burgess (3):
>   objdump/opcodes: add syntax highlighting to disassembler output
>   opcodes/riscv: implement style support in the disassembler
>   opcodes/i386: partially implement disassembler style support
>
>  binutils/NEWS              |   4 +
>  binutils/doc/binutils.texi |  11 ++
>  binutils/objdump.c         | 245 ++++++++++++++++++++++++++++++++-----
>  gdb/disasm.c               |  34 ++++-
>  gdb/disasm.h               |   7 ++
>  include/dis-asm.h          |  62 +++++++++-
>  opcodes/dis-init.c         |   5 +-
>  opcodes/disassemble.c      |  23 +++-
>  opcodes/i386-dis.c         |  71 +++++++----
>  opcodes/riscv-dis.c        | 147 +++++++++++-----------
>  10 files changed, 475 insertions(+), 134 deletions(-)
>
> --
> 2.25.4
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] opcodes/i386: partially implement disassembler style support
  2022-02-16 20:53 ` [PATCH 3/3] opcodes/i386: partially implement disassembler style support Andrew Burgess
@ 2022-02-17  9:35   ` Jan Beulich
  2022-02-17 16:15     ` Andrew Burgess
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2022-02-17  9:35 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: binutils

On 16.02.2022 21:53, Andrew Burgess via Binutils wrote:
> @@ -9456,12 +9456,18 @@ print_insn (bfd_vma pc, instr_info *ins)
>  	{
>  	  name = prefix_name (ins, priv.the_buffer[0], priv.orig_sizeflag);
>  	  if (name != NULL)
> -	    (*ins->info->fprintf_func) (ins->info->stream, "%s", name);
> +	    (*ins->info->fprintf_styled_func)
> +	      (ins->info->stream, dis_style_mnemonic, "%s", name);
>  	  else
>  	    {
>  	      /* Just print the first byte as a .byte instruction.  */
> -	      (*ins->info->fprintf_func) (ins->info->stream, ".byte 0x%x",
> -				     (unsigned int) priv.the_buffer[0]);
> +	      (*ins->info->fprintf_styled_func)
> +		(ins->info->stream, dis_style_mnemonic, ".byte");

Perhaps better have dis_style_directive for this? It's certainly not
an insn mnemonic.

> +	      (*ins->info->fprintf_styled_func)
> +		(ins->info->stream, dis_style_text, " ");
> +	      (*ins->info->fprintf_styled_func)
> +		(ins->info->stream, dis_style_immediate, "0x%x",
> +		 (unsigned int) priv.the_buffer[0]);

I wonder if the naming (dis_style_immediate) isn't misleading. As per
the comment next to its definition it really appears to mean any kind
of number (like is the case here), not just immediate operands of
instructions. Hence maybe dis_style_number (as replacement for or in
addition to dis_style_immediate)?

> @@ -9497,10 +9503,15 @@ print_insn (bfd_vma pc, instr_info *ins)
>        /* Handle ins->prefixes before fwait.  */
>        for (i = 0; i < ins->fwait_prefix && ins->all_prefixes[i];
>  	   i++)
> -	(*ins->info->fprintf_func) (ins->info->stream, "%s ",
> -				    prefix_name (ins, ins->all_prefixes[i],
> -						 sizeflag));
> -      (*ins->info->fprintf_func) (ins->info->stream, "fwait");
> +	{
> +	  (*ins->info->fprintf_styled_func)
> +	    (ins->info->stream, dis_style_mnemonic, "%s",
> +	     prefix_name (ins, ins->all_prefixes[i], sizeflag));
> +	  (*ins->info->fprintf_styled_func)
> +	    (ins->info->stream, dis_style_mnemonic, " ");

Does the style matter for blanks? If so, why "mnemonic" here, but ...

> @@ -9744,13 +9759,17 @@ print_insn (bfd_vma pc, instr_info *ins)
>  	if (name == NULL)
>  	  abort ();
>  	prefix_length += strlen (name) + 1;
> -	(*ins->info->fprintf_func) (ins->info->stream, "%s ", name);
> +	(*ins->info->fprintf_styled_func)
> +	  (ins->info->stream, dis_style_mnemonic, "%s", name);
> +	(*ins->info->fprintf_styled_func)
> +	  (ins->info->stream, dis_style_text, " ");

... "text" here? If the style didn't matter, a single call (as it was
before) would seem to suffice in both cases.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] opcodes/i386: partially implement disassembler style support
  2022-02-17  9:35   ` Jan Beulich
@ 2022-02-17 16:15     ` Andrew Burgess
  2022-02-17 16:29       ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Burgess @ 2022-02-17 16:15 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils

Jan Beulich via Binutils <binutils@sourceware.org> writes:

> On 16.02.2022 21:53, Andrew Burgess via Binutils wrote:
>> @@ -9456,12 +9456,18 @@ print_insn (bfd_vma pc, instr_info *ins)
>>  	{
>>  	  name = prefix_name (ins, priv.the_buffer[0], priv.orig_sizeflag);
>>  	  if (name != NULL)
>> -	    (*ins->info->fprintf_func) (ins->info->stream, "%s", name);
>> +	    (*ins->info->fprintf_styled_func)
>> +	      (ins->info->stream, dis_style_mnemonic, "%s", name);
>>  	  else
>>  	    {
>>  	      /* Just print the first byte as a .byte instruction.  */
>> -	      (*ins->info->fprintf_func) (ins->info->stream, ".byte 0x%x",
>> -				     (unsigned int) priv.the_buffer[0]);
>> +	      (*ins->info->fprintf_styled_func)
>> +		(ins->info->stream, dis_style_mnemonic, ".byte");
>
> Perhaps better have dis_style_directive for this? It's certainly not
> an insn mnemonic.

Are you suggesting directive in addition to mnemonic?  Or as a
replacement for?

My goal with the style list was to try and keep the number of styles
pretty small, an instrution mnemonic like 'add' and a directive like
'.byte' seemed to have a similar enough function that styling them
identically felt OK.

>
>> +	      (*ins->info->fprintf_styled_func)
>> +		(ins->info->stream, dis_style_text, " ");
>> +	      (*ins->info->fprintf_styled_func)
>> +		(ins->info->stream, dis_style_immediate, "0x%x",
>> +		 (unsigned int) priv.the_buffer[0]);
>
> I wonder if the naming (dis_style_immediate) isn't misleading. As per
> the comment next to its definition it really appears to mean any kind
> of number (like is the case here), not just immediate operands of
> instructions. Hence maybe dis_style_number (as replacement for or in
> addition to dis_style_immediate)?

You mentioned this before in the previous thread, and I didn't really
understand then either.

Can you give an example of something that's a number, but not an
immediate?  e.g. I wonder (given the instruction/directive distinction
you draw above), I wonder if you're conserned about: '.byte 0x4', maybe
you don't like referring to this 0x4 here as an immediate?

>
>> @@ -9497,10 +9503,15 @@ print_insn (bfd_vma pc, instr_info *ins)
>>        /* Handle ins->prefixes before fwait.  */
>>        for (i = 0; i < ins->fwait_prefix && ins->all_prefixes[i];
>>  	   i++)
>> -	(*ins->info->fprintf_func) (ins->info->stream, "%s ",
>> -				    prefix_name (ins, ins->all_prefixes[i],
>> -						 sizeflag));
>> -      (*ins->info->fprintf_func) (ins->info->stream, "fwait");
>> +	{
>> +	  (*ins->info->fprintf_styled_func)
>> +	    (ins->info->stream, dis_style_mnemonic, "%s",
>> +	     prefix_name (ins, ins->all_prefixes[i], sizeflag));
>> +	  (*ins->info->fprintf_styled_func)
>> +	    (ins->info->stream, dis_style_mnemonic, " ");
>
> Does the style matter for blanks? If so, why "mnemonic" here, but ...
>
>> @@ -9744,13 +9759,17 @@ print_insn (bfd_vma pc, instr_info *ins)
>>  	if (name == NULL)
>>  	  abort ();
>>  	prefix_length += strlen (name) + 1;
>> -	(*ins->info->fprintf_func) (ins->info->stream, "%s ", name);
>> +	(*ins->info->fprintf_styled_func)
>> +	  (ins->info->stream, dis_style_mnemonic, "%s", name);
>> +	(*ins->info->fprintf_styled_func)
>> +	  (ins->info->stream, dis_style_text, " ");
>
> ... "text" here? If the style didn't matter, a single call (as it was
> before) would seem to suffice in both cases.

No the style doesn't really matter.  I guess it might be possible that a
user could choose to style the foreground colour, in which case these
two spaces would appear different... I'd be tempted to say white space
should be printed with 'text' style.  I'll take another pass though this
patch and clean this up.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 0/3] disassembler syntax highlighting in objdump (via libopcodes)
  2022-02-17  3:57 ` [PATCH 0/3] disassembler syntax highlighting in objdump (via libopcodes) Nelson Chu
@ 2022-02-17 16:17   ` Andrew Burgess
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Burgess @ 2022-02-17 16:17 UTC (permalink / raw)
  To: Nelson Chu; +Cc: Binutils

Nelson Chu <nelson.chu@sifive.com> writes:

> Hi Andrew,
>
> I get gas/testsuite/gas/riscv/insn testcase fail after applying the
> series of patches.  Here is the reduced case,
>
> $ cat tmp.s
> .word 0x68c58543
> fmadd.s fa0,fa1,fa2,fa3,rne
> .insn r  MADD, 0, 0, a0, a1, a2, a3
> .insn 0x68c58543
>
> All these are the same instruction encoding - fmadd.s, but the first
> ".word" will be marked and dumped as data rather than instruction.  We
> are used to get the following dump result,
>
> $ /scratch/nelsonc/build-upstream/rv64gc-elf/build-install/bin/riscv64-unknown-elf-objdump
> -d tmp.o
>
> 0000000000000000 <.text>:
>    0:   68c58543                .word   0x68c58543
>    4:   68c58543                fmadd.s fa0,fa1,fa2,fa3,rne
>    8:   68c58543                fmadd.s fa0,fa1,fa2,fa3,rne
>    c:   68c58543                fmadd.s fa0,fa1,fa2,fa3,rne
>
> But now with the patches, I will get segmentation fault when trying to
> dump fmadd.s as instruction,
>
> $ /scratch/nelsonc/build-upstream/rv64gc-elf/build-install/bin/riscv64-unknown-elf-objdump
> -d tmp.o
>
> 0000000000000000 <.text>:
>    0:   68c58543                .word   0x68c58543
> Segmentation fault (core dumped)
>
> Is it possible if you could help to see what happened?  Thanks!

Sorry about that.  I'll get this fixed and post an updated version.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] opcodes/i386: partially implement disassembler style support
  2022-02-17 16:15     ` Andrew Burgess
@ 2022-02-17 16:29       ` Jan Beulich
  2022-02-17 22:37         ` Andrew Burgess
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2022-02-17 16:29 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: binutils

On 17.02.2022 17:15, Andrew Burgess wrote:
> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>> On 16.02.2022 21:53, Andrew Burgess via Binutils wrote:
>>> @@ -9456,12 +9456,18 @@ print_insn (bfd_vma pc, instr_info *ins)
>>>  	{
>>>  	  name = prefix_name (ins, priv.the_buffer[0], priv.orig_sizeflag);
>>>  	  if (name != NULL)
>>> -	    (*ins->info->fprintf_func) (ins->info->stream, "%s", name);
>>> +	    (*ins->info->fprintf_styled_func)
>>> +	      (ins->info->stream, dis_style_mnemonic, "%s", name);
>>>  	  else
>>>  	    {
>>>  	      /* Just print the first byte as a .byte instruction.  */
>>> -	      (*ins->info->fprintf_func) (ins->info->stream, ".byte 0x%x",
>>> -				     (unsigned int) priv.the_buffer[0]);
>>> +	      (*ins->info->fprintf_styled_func)
>>> +		(ins->info->stream, dis_style_mnemonic, ".byte");
>>
>> Perhaps better have dis_style_directive for this? It's certainly not
>> an insn mnemonic.
> 
> Are you suggesting directive in addition to mnemonic?  Or as a
> replacement for?

In addition to. They're fundamentally different things at the
assembler level. But that's just my view ...

> My goal with the style list was to try and keep the number of styles
> pretty small, an instrution mnemonic like 'add' and a directive like
> '.byte' seemed to have a similar enough function that styling them
> identically felt OK.
> 
>>
>>> +	      (*ins->info->fprintf_styled_func)
>>> +		(ins->info->stream, dis_style_text, " ");
>>> +	      (*ins->info->fprintf_styled_func)
>>> +		(ins->info->stream, dis_style_immediate, "0x%x",
>>> +		 (unsigned int) priv.the_buffer[0]);
>>
>> I wonder if the naming (dis_style_immediate) isn't misleading. As per
>> the comment next to its definition it really appears to mean any kind
>> of number (like is the case here), not just immediate operands of
>> instructions. Hence maybe dis_style_number (as replacement for or in
>> addition to dis_style_immediate)?
> 
> You mentioned this before in the previous thread, and I didn't really
> understand then either.
> 
> Can you give an example of something that's a number, but not an
> immediate?  e.g. I wonder (given the instruction/directive distinction
> you draw above), I wonder if you're conserned about: '.byte 0x4', maybe
> you don't like referring to this 0x4 here as an immediate?

Well, an operand to a directive for example is not an immediate imo,
yes. A "load offset" (as your comment calls it) may also not be an
immediate. E.g. in x86 memory access instructions:

	mov	0x10(%rbx), %eax

the 0x10 isn't an immediate, but a displacement. The difference may
be more relevant in something like

	mov	$0, 0x10(%rbx)

where the $0 is an immediate operand, but the 0x10 isn't (and you
wouldn't want to mix the two).

From that comment it's not clear to me where else you would think
"immediate" applies (or not), but in RISC-V's

	lw	x0, 0x10(x0)

I wouldn't consider the 0x10 an immediate either, albeit this may
be a result of my x86 bias.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] opcodes/i386: partially implement disassembler style support
  2022-02-17 16:29       ` Jan Beulich
@ 2022-02-17 22:37         ` Andrew Burgess
  2022-02-18  7:14           ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Burgess @ 2022-02-17 22:37 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils

Jan Beulich via Binutils <binutils@sourceware.org> writes:

> On 17.02.2022 17:15, Andrew Burgess wrote:
>> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>>> On 16.02.2022 21:53, Andrew Burgess via Binutils wrote:
>>>> @@ -9456,12 +9456,18 @@ print_insn (bfd_vma pc, instr_info *ins)
>>>>  	{
>>>>  	  name = prefix_name (ins, priv.the_buffer[0], priv.orig_sizeflag);
>>>>  	  if (name != NULL)
>>>> -	    (*ins->info->fprintf_func) (ins->info->stream, "%s", name);
>>>> +	    (*ins->info->fprintf_styled_func)
>>>> +	      (ins->info->stream, dis_style_mnemonic, "%s", name);
>>>>  	  else
>>>>  	    {
>>>>  	      /* Just print the first byte as a .byte instruction.  */
>>>> -	      (*ins->info->fprintf_func) (ins->info->stream, ".byte 0x%x",
>>>> -				     (unsigned int) priv.the_buffer[0]);
>>>> +	      (*ins->info->fprintf_styled_func)
>>>> +		(ins->info->stream, dis_style_mnemonic, ".byte");
>>>
>>> Perhaps better have dis_style_directive for this? It's certainly not
>>> an insn mnemonic.
>> 
>> Are you suggesting directive in addition to mnemonic?  Or as a
>> replacement for?
>
> In addition to. They're fundamentally different things at the
> assembler level. But that's just my view ...
>
>> My goal with the style list was to try and keep the number of styles
>> pretty small, an instrution mnemonic like 'add' and a directive like
>> '.byte' seemed to have a similar enough function that styling them
>> identically felt OK.
>> 
>>>
>>>> +	      (*ins->info->fprintf_styled_func)
>>>> +		(ins->info->stream, dis_style_text, " ");
>>>> +	      (*ins->info->fprintf_styled_func)
>>>> +		(ins->info->stream, dis_style_immediate, "0x%x",
>>>> +		 (unsigned int) priv.the_buffer[0]);
>>>
>>> I wonder if the naming (dis_style_immediate) isn't misleading. As per
>>> the comment next to its definition it really appears to mean any kind
>>> of number (like is the case here), not just immediate operands of
>>> instructions. Hence maybe dis_style_number (as replacement for or in
>>> addition to dis_style_immediate)?
>> 
>> You mentioned this before in the previous thread, and I didn't really
>> understand then either.
>> 
>> Can you give an example of something that's a number, but not an
>> immediate?  e.g. I wonder (given the instruction/directive distinction
>> you draw above), I wonder if you're conserned about: '.byte 0x4', maybe
>> you don't like referring to this 0x4 here as an immediate?
>
> Well, an operand to a directive for example is not an immediate imo,
> yes. A "load offset" (as your comment calls it) may also not be an
> immediate. E.g. in x86 memory access instructions:
>
> 	mov	0x10(%rbx), %eax
>
> the 0x10 isn't an immediate, but a displacement. The difference may
> be more relevant in something like
>
> 	mov	$0, 0x10(%rbx)
>
> where the $0 is an immediate operand, but the 0x10 isn't (and you
> wouldn't want to mix the two).
>
> From that comment it's not clear to me where else you would think
> "immediate" applies (or not), but in RISC-V's
>
> 	lw	x0, 0x10(x0)
>
> I wouldn't consider the 0x10 an immediate either, albeit this may
> be a result of my x86 bias.

I wonder if there's a name we could come up with that would allow me to
classify the '$0' and '0x10' (in your example above) as the same style?

I've kind-of lost the thread a bit, but maybe that's what the 'number'
you suggested original was for?  If I replaced dis_style_immediate with
dis_style_number, and just replaced thoughout, would that be less
problematic?

Another possibility would be to have some aliases either in the original
enum, as in:

  dis_style_displacement = dis_style_immediate,

or even at the top of i386-dis.c, as in:

  #define dis_style_displacement dis_style_immediate

I really think we should avoid adding too many distinct styles if we
can.  My concern is less about disassembler users handling the different
styles, and more about consistency between the disassemblers.  I figure
it's easier to be consistent if we only have a small number of styles.
If displacement is a different style to other immediate (yes, I'm still
going to call numbers in instruction immediates!) then we end up with
some architectures going one way, and others another.

Anyway, that's just my feelings, if the consensus is that the more
styles the better, then I'm sure we can manage...

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] opcodes/i386: partially implement disassembler style support
  2022-02-17 22:37         ` Andrew Burgess
@ 2022-02-18  7:14           ` Jan Beulich
  2022-02-19 10:54             ` Andrew Burgess
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2022-02-18  7:14 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: binutils

On 17.02.2022 23:37, Andrew Burgess wrote:
> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>> On 17.02.2022 17:15, Andrew Burgess wrote:
>>> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>>>> On 16.02.2022 21:53, Andrew Burgess via Binutils wrote:
>>>>> +	      (*ins->info->fprintf_styled_func)
>>>>> +		(ins->info->stream, dis_style_text, " ");
>>>>> +	      (*ins->info->fprintf_styled_func)
>>>>> +		(ins->info->stream, dis_style_immediate, "0x%x",
>>>>> +		 (unsigned int) priv.the_buffer[0]);
>>>>
>>>> I wonder if the naming (dis_style_immediate) isn't misleading. As per
>>>> the comment next to its definition it really appears to mean any kind
>>>> of number (like is the case here), not just immediate operands of
>>>> instructions. Hence maybe dis_style_number (as replacement for or in
>>>> addition to dis_style_immediate)?
>>>
>>> You mentioned this before in the previous thread, and I didn't really
>>> understand then either.
>>>
>>> Can you give an example of something that's a number, but not an
>>> immediate?  e.g. I wonder (given the instruction/directive distinction
>>> you draw above), I wonder if you're conserned about: '.byte 0x4', maybe
>>> you don't like referring to this 0x4 here as an immediate?
>>
>> Well, an operand to a directive for example is not an immediate imo,
>> yes. A "load offset" (as your comment calls it) may also not be an
>> immediate. E.g. in x86 memory access instructions:
>>
>> 	mov	0x10(%rbx), %eax
>>
>> the 0x10 isn't an immediate, but a displacement. The difference may
>> be more relevant in something like
>>
>> 	mov	$0, 0x10(%rbx)
>>
>> where the $0 is an immediate operand, but the 0x10 isn't (and you
>> wouldn't want to mix the two).
>>
>> From that comment it's not clear to me where else you would think
>> "immediate" applies (or not), but in RISC-V's
>>
>> 	lw	x0, 0x10(x0)
>>
>> I wouldn't consider the 0x10 an immediate either, albeit this may
>> be a result of my x86 bias.
> 
> I wonder if there's a name we could come up with that would allow me to
> classify the '$0' and '0x10' (in your example above) as the same style?
> 
> I've kind-of lost the thread a bit, but maybe that's what the 'number'
> you suggested original was for?  If I replaced dis_style_immediate with
> dis_style_number, and just replaced thoughout, would that be less
> problematic?

Yes, "number" was meant to be a possible replacement. Whether it's
helpful to style all forms of numbers the same is questionable
though. Note that to me in "$0" the '$' would not be covered by
"number" then, while it would be covered by "immediate".

> Another possibility would be to have some aliases either in the original
> enum, as in:
> 
>   dis_style_displacement = dis_style_immediate,
> 
> or even at the top of i386-dis.c, as in:
> 
>   #define dis_style_displacement dis_style_immediate

But that's wrong. I'm not primarily after the naming in the sources,
but after the output not showing distinct things as similar.

> I really think we should avoid adding too many distinct styles if we
> can.  My concern is less about disassembler users handling the different
> styles, and more about consistency between the disassemblers.  I figure
> it's easier to be consistent if we only have a small number of styles.
> If displacement is a different style to other immediate (yes, I'm still
> going to call numbers in instruction immediates!) then we end up with
> some architectures going one way, and others another.

I agree with the goal of limiting the number of styles. But instead
of marking distinct items with similar non-basic (text) styles, I'd
rather see items left alone then. IOW with dis_style_immediate I'd
prefer (in the x86 example) to see only true immediate insn operands
be tagged that way, and all other numbers to remain dis_style_text.

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCH 2/3] opcodes/riscv: implement style support in the disassembler
  2022-02-16 20:53 ` [PATCH 2/3] opcodes/riscv: implement style support in the disassembler Andrew Burgess
@ 2022-02-19 10:24   ` Andrew Burgess
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Burgess @ 2022-02-19 10:24 UTC (permalink / raw)
  To: binutils

Nelson pointed out some regressions caused by this patch.

Below is an updated version with the bug fixed.

Thanks,
Andrew

---

commit 61eebdf16af9855e2965af72b9424d42ee7c827f
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Sat Feb 5 11:25:24 2022 +0000

    opcodes/riscv: implement style support in the disassembler
    
    Update the RISC-V disassembler to supply style information.  This
    allows objdump to apply syntax highlighting to the disassembler
    output (when the appropriate command line flag is used).
    
    Ignoring colours, there should be no other user visible changes in the
    output of the disassembler in either objdump or gdb.
    
    opcodes/ChangeLog:
    
            * disassembler.c (disassemble_init_for_target): Set
            created_styled_output for riscv.
            * riscv-dis.c: Changed throughout to use fprintf_styled_func
            instead of fprintf_func.

diff --git a/opcodes/disassemble.c b/opcodes/disassemble.c
index 15df49ea331..b8eed9695ea 100644
--- a/opcodes/disassemble.c
+++ b/opcodes/disassemble.c
@@ -708,6 +708,7 @@ disassemble_init_for_target (struct disassemble_info * info)
 #ifdef ARCH_riscv
     case bfd_arch_riscv:
       info->symbol_is_valid = riscv_symbol_is_valid;
+      info->created_styled_output = true;
       break;
 #endif
 #ifdef ARCH_wasm32
diff --git a/opcodes/riscv-dis.c b/opcodes/riscv-dis.c
index 34724d4aec5..bfd8e702b9c 100644
--- a/opcodes/riscv-dis.c
+++ b/opcodes/riscv-dis.c
@@ -165,7 +165,7 @@ arg_print (struct disassemble_info *info, unsigned long val,
 	   const char* const* array, size_t size)
 {
   const char *s = val >= size || array[val] == NULL ? "unknown" : array[val];
-  (*info->fprintf_func) (info->stream, "%s", s);
+  (*info->fprintf_styled_func) (info->stream, dis_style_text, "%s", s);
 }
 
 static void
@@ -195,11 +195,11 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
   struct riscv_private_data *pd = info->private_data;
   int rs1 = (l >> OP_SH_RS1) & OP_MASK_RS1;
   int rd = (l >> OP_SH_RD) & OP_MASK_RD;
-  fprintf_ftype print = info->fprintf_func;
+  fprintf_styled_ftype print = info->fprintf_styled_func;
   const char *opargStart;
 
   if (*oparg != '\0')
-    print (info->stream, "\t");
+    print (info->stream, dis_style_text, "\t");
 
   for (; *oparg != '\0'; oparg++)
     {
@@ -211,22 +211,22 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	    {
 	    case 's': /* RS1 x8-x15.  */
 	    case 'w': /* RS1 x8-x15.  */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_gpr_names[EXTRACT_OPERAND (CRS1S, l) + 8]);
 	      break;
 	    case 't': /* RS2 x8-x15.  */
 	    case 'x': /* RS2 x8-x15.  */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_gpr_names[EXTRACT_OPERAND (CRS2S, l) + 8]);
 	      break;
 	    case 'U': /* RS1, constrained to equal RD.  */
-	      print (info->stream, "%s", riscv_gpr_names[rd]);
+	      print (info->stream, dis_style_register, "%s", riscv_gpr_names[rd]);
 	      break;
 	    case 'c': /* RS1, constrained to equal sp.  */
-	      print (info->stream, "%s", riscv_gpr_names[X_SP]);
+	      print (info->stream, dis_style_register, "%s", riscv_gpr_names[X_SP]);
 	      break;
 	    case 'V': /* RS2 */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_gpr_names[EXTRACT_OPERAND (CRS2, l)]);
 	      break;
 	    case 'o':
@@ -236,31 +236,31 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	      if (info->mach == bfd_mach_riscv64
 		  && ((l & MASK_C_ADDIW) == MATCH_C_ADDIW) && rd != 0)
 		maybe_print_address (pd, rd, EXTRACT_CITYPE_IMM (l), 1);
-	      print (info->stream, "%d", (int)EXTRACT_CITYPE_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CITYPE_IMM (l));
 	      break;
 	    case 'k':
-	      print (info->stream, "%d", (int)EXTRACT_CLTYPE_LW_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CLTYPE_LW_IMM (l));
 	      break;
 	    case 'l':
-	      print (info->stream, "%d", (int)EXTRACT_CLTYPE_LD_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CLTYPE_LD_IMM (l));
 	      break;
 	    case 'm':
-	      print (info->stream, "%d", (int)EXTRACT_CITYPE_LWSP_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CITYPE_LWSP_IMM (l));
 	      break;
 	    case 'n':
-	      print (info->stream, "%d", (int)EXTRACT_CITYPE_LDSP_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CITYPE_LDSP_IMM (l));
 	      break;
 	    case 'K':
-	      print (info->stream, "%d", (int)EXTRACT_CIWTYPE_ADDI4SPN_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CIWTYPE_ADDI4SPN_IMM (l));
 	      break;
 	    case 'L':
-	      print (info->stream, "%d", (int)EXTRACT_CITYPE_ADDI16SP_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CITYPE_ADDI16SP_IMM (l));
 	      break;
 	    case 'M':
-	      print (info->stream, "%d", (int)EXTRACT_CSSTYPE_SWSP_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CSSTYPE_SWSP_IMM (l));
 	      break;
 	    case 'N':
-	      print (info->stream, "%d", (int)EXTRACT_CSSTYPE_SDSP_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_CSSTYPE_SDSP_IMM (l));
 	      break;
 	    case 'p':
 	      info->target = EXTRACT_CBTYPE_IMM (l) + pc;
@@ -271,21 +271,21 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	      (*info->print_address_func) (info->target, info);
 	      break;
 	    case 'u':
-	      print (info->stream, "0x%x",
+	      print (info->stream, dis_style_immediate, "0x%x",
 		     (int)(EXTRACT_CITYPE_IMM (l) & (RISCV_BIGIMM_REACH-1)));
 	      break;
 	    case '>':
-	      print (info->stream, "0x%x", (int)EXTRACT_CITYPE_IMM (l) & 0x3f);
+	      print (info->stream, dis_style_immediate, "0x%x", (int)EXTRACT_CITYPE_IMM (l) & 0x3f);
 	      break;
 	    case '<':
-	      print (info->stream, "0x%x", (int)EXTRACT_CITYPE_IMM (l) & 0x1f);
+	      print (info->stream, dis_style_immediate, "0x%x", (int)EXTRACT_CITYPE_IMM (l) & 0x1f);
 	      break;
 	    case 'T': /* Floating-point RS2.  */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_fpr_names[EXTRACT_OPERAND (CRS2, l)]);
 	      break;
 	    case 'D': /* Floating-point RS2 x8-x15.  */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_fpr_names[EXTRACT_OPERAND (CRS2S, l) + 8]);
 	      break;
 	    }
@@ -296,28 +296,28 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	    {
 	    case 'd':
 	    case 'f':
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_vecr_names_numeric[EXTRACT_OPERAND (VD, l)]);
 	      break;
 	    case 'e':
 	      if (!EXTRACT_OPERAND (VWD, l))
-		print (info->stream, "%s", riscv_gpr_names[0]);
+		print (info->stream, dis_style_register, "%s", riscv_gpr_names[0]);
 	      else
-		print (info->stream, "%s",
+		print (info->stream, dis_style_register, "%s",
 		       riscv_vecr_names_numeric[EXTRACT_OPERAND (VD, l)]);
 	      break;
 	    case 's':
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_vecr_names_numeric[EXTRACT_OPERAND (VS1, l)]);
 	      break;
 	    case 't':
 	    case 'u': /* VS1 == VS2 already verified at this point.  */
 	    case 'v': /* VD == VS1 == VS2 already verified at this point.  */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_vecr_names_numeric[EXTRACT_OPERAND (VS2, l)]);
 	      break;
 	    case '0':
-	      print (info->stream, "%s", riscv_vecr_names_numeric[0]);
+	      print (info->stream, dis_style_register, "%s", riscv_vecr_names_numeric[0]);
 	      break;
 	    case 'b':
 	    case 'c':
@@ -337,25 +337,26 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 		    && !imm_vtype_res
 		    && riscv_vsew[imm_vsew] != NULL
 		    && riscv_vlmul[imm_vlmul] != NULL)
-		  print (info->stream, "%s,%s,%s,%s", riscv_vsew[imm_vsew],
+		  print (info->stream, dis_style_text, "%s,%s,%s,%s",
+			 riscv_vsew[imm_vsew],
 			 riscv_vlmul[imm_vlmul], riscv_vta[imm_vta],
 			 riscv_vma[imm_vma]);
 		else
-		  print (info->stream, "%d", imm);
+		  print (info->stream, dis_style_immediate, "%d", imm);
 	      }
 	      break;
 	    case 'i':
-	      print (info->stream, "%d", (int)EXTRACT_RVV_VI_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_RVV_VI_IMM (l));
 	      break;
 	    case 'j':
-	      print (info->stream, "%d", (int)EXTRACT_RVV_VI_UIMM (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_RVV_VI_UIMM (l));
 	      break;
 	    case 'k':
-	      print (info->stream, "%d", (int)EXTRACT_RVV_OFFSET (l));
+	      print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_RVV_OFFSET (l));
 	      break;
 	    case 'm':
 	      if (! EXTRACT_OPERAND (VMASK, l))
-		print (info->stream, ",%s", riscv_vecm_names_numeric[0]);
+		print (info->stream, dis_style_register, ",%s", riscv_vecm_names_numeric[0]);
 	      break;
 	    }
 	  break;
@@ -365,29 +366,29 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	case ')':
 	case '[':
 	case ']':
-	  print (info->stream, "%c", *oparg);
+	  print (info->stream, dis_style_text, "%c", *oparg);
 	  break;
 
 	case '0':
 	  /* Only print constant 0 if it is the last argument.  */
 	  if (!oparg[1])
-	    print (info->stream, "0");
+	    print (info->stream, dis_style_immediate, "0");
 	  break;
 
 	case 'b':
 	case 's':
 	  if ((l & MASK_JALR) == MATCH_JALR)
 	    maybe_print_address (pd, rs1, 0, 0);
-	  print (info->stream, "%s", riscv_gpr_names[rs1]);
+	  print (info->stream, dis_style_register, "%s", riscv_gpr_names[rs1]);
 	  break;
 
 	case 't':
-	  print (info->stream, "%s",
+	  print (info->stream, dis_style_register, "%s",
 		 riscv_gpr_names[EXTRACT_OPERAND (RS2, l)]);
 	  break;
 
 	case 'u':
-	  print (info->stream, "0x%x",
+	  print (info->stream, dis_style_immediate, "0x%x",
 		 (unsigned)EXTRACT_UTYPE_IMM (l) >> RISCV_IMM_BITS);
 	  break;
 
@@ -416,12 +417,12 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	  if (info->mach == bfd_mach_riscv64
 	      && ((l & MASK_ADDIW) == MATCH_ADDIW) && rs1 != 0)
 	    maybe_print_address (pd, rs1, EXTRACT_ITYPE_IMM (l), 1);
-	  print (info->stream, "%d", (int)EXTRACT_ITYPE_IMM (l));
+	  print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_ITYPE_IMM (l));
 	  break;
 
 	case 'q':
 	  maybe_print_address (pd, rs1, EXTRACT_STYPE_IMM (l), 0);
-	  print (info->stream, "%d", (int)EXTRACT_STYPE_IMM (l));
+	  print (info->stream, dis_style_immediate, "%d", (int)EXTRACT_STYPE_IMM (l));
 	  break;
 
 	case 'a':
@@ -441,40 +442,40 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	    pd->hi_addr[rd] = EXTRACT_UTYPE_IMM (l);
 	  else if ((l & MASK_C_LUI) == MATCH_C_LUI)
 	    pd->hi_addr[rd] = EXTRACT_CITYPE_LUI_IMM (l);
-	  print (info->stream, "%s", riscv_gpr_names[rd]);
+	  print (info->stream, dis_style_register, "%s", riscv_gpr_names[rd]);
 	  break;
 
 	case 'y':
-	  print (info->stream, "0x%x", (int)EXTRACT_OPERAND (BS, l));
+	  print (info->stream, dis_style_text, "0x%x", (int)EXTRACT_OPERAND (BS, l));
 	  break;
 
 	case 'z':
-	  print (info->stream, "%s", riscv_gpr_names[0]);
+	  print (info->stream, dis_style_register, "%s", riscv_gpr_names[0]);
 	  break;
 
 	case '>':
-	  print (info->stream, "0x%x", (int)EXTRACT_OPERAND (SHAMT, l));
+	  print (info->stream, dis_style_immediate, "0x%x", (int)EXTRACT_OPERAND (SHAMT, l));
 	  break;
 
 	case '<':
-	  print (info->stream, "0x%x", (int)EXTRACT_OPERAND (SHAMTW, l));
+	  print (info->stream, dis_style_text, "0x%x", (int)EXTRACT_OPERAND (SHAMTW, l));
 	  break;
 
 	case 'S':
 	case 'U':
-	  print (info->stream, "%s", riscv_fpr_names[rs1]);
+	  print (info->stream, dis_style_text, "%s", riscv_fpr_names[rs1]);
 	  break;
 
 	case 'T':
-	  print (info->stream, "%s", riscv_fpr_names[EXTRACT_OPERAND (RS2, l)]);
+	  print (info->stream, dis_style_text, "%s", riscv_fpr_names[EXTRACT_OPERAND (RS2, l)]);
 	  break;
 
 	case 'D':
-	  print (info->stream, "%s", riscv_fpr_names[rd]);
+	  print (info->stream, dis_style_text, "%s", riscv_fpr_names[rd]);
 	  break;
 
 	case 'R':
-	  print (info->stream, "%s", riscv_fpr_names[EXTRACT_OPERAND (RS3, l)]);
+	  print (info->stream, dis_style_text, "%s", riscv_fpr_names[EXTRACT_OPERAND (RS3, l)]);
 	  break;
 
 	case 'E':
@@ -507,23 +508,23 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	      }
 
 	    if (riscv_csr_hash[csr] != NULL)
-	      print (info->stream, "%s", riscv_csr_hash[csr]);
+	      print (info->stream, dis_style_text, "%s", riscv_csr_hash[csr]);
 	    else
-	      print (info->stream, "0x%x", csr);
+	      print (info->stream, dis_style_text, "0x%x", csr);
 	    break;
 	  }
 
 	case 'Y':
-	  print (info->stream, "0x%x", (int)EXTRACT_OPERAND (RNUM, l));
+	  print (info->stream, dis_style_text, "0x%x", (int)EXTRACT_OPERAND (RNUM, l));
 	  break;
 
 	case 'Z':
-	  print (info->stream, "%d", rs1);
+	  print (info->stream, dis_style_text, "%d", rs1);
 	  break;
 
 	default:
 	  /* xgettext:c-format */
-	  print (info->stream, _("# internal error, undefined modifier (%c)"),
+	  print (info->stream, dis_style_text, _("# internal error, undefined modifier (%c)"),
 		 *opargStart);
 	  return;
 	}
@@ -623,14 +624,15 @@ riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
 	    continue;
 
 	  /* It's a match.  */
-	  (*info->fprintf_func) (info->stream, "%s", op->name);
+	  (*info->fprintf_styled_func) (info->stream, dis_style_mnemonic, "%s", op->name);
 	  print_insn_args (op->args, word, memaddr, info);
 
 	  /* Try to disassemble multi-instruction addressing sequences.  */
 	  if (pd->print_addr != (bfd_vma)-1)
 	    {
 	      info->target = pd->print_addr;
-	      (*info->fprintf_func) (info->stream, " # ");
+	      (*info->fprintf_styled_func)
+		(info->stream, dis_style_comment_start, " # ");
 	      (*info->print_address_func) (info->target, info);
 	      pd->print_addr = -1;
 	    }
@@ -672,19 +674,24 @@ riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
     case 2:
     case 4:
     case 8:
-      (*info->fprintf_func) (info->stream, ".%dbyte\t0x%llx",
-                             insnlen, (unsigned long long) word);
+      (*info->fprintf_styled_func) (info->stream, dis_style_mnemonic,
+				    ".%dbyte\t", insnlen);
+      (*info->fprintf_styled_func) (info->stream, dis_style_immediate,
+				    "0x%llx", (unsigned long long) word);
       break;
     default:
       {
         int i;
-        (*info->fprintf_func) (info->stream, ".byte\t");
+	(*info->fprintf_styled_func) (info->stream, dis_style_mnemonic,
+				      ".byte\t");
         for (i = 0; i < insnlen; ++i)
           {
             if (i > 0)
-              (*info->fprintf_func) (info->stream, ", ");
-            (*info->fprintf_func) (info->stream, "0x%02x",
-                                   (unsigned int) (word & 0xff));
+	      (*info->fprintf_styled_func) (info->stream, dis_style_text,
+					    ", ");
+	    (*info->fprintf_styled_func) (info->stream, dis_style_immediate,
+					  "0x%02x",
+					  (unsigned int) (word & 0xff));
             word >>= 8;
           }
       }
@@ -863,22 +870,22 @@ riscv_disassemble_data (bfd_vma memaddr ATTRIBUTE_UNUSED,
     {
     case 1:
       info->bytes_per_line = 6;
-      (*info->fprintf_func) (info->stream, ".byte\t0x%02llx",
+      (*info->fprintf_styled_func) (info->stream, dis_style_text, ".byte\t0x%02llx",
 			     (unsigned long long) data);
       break;
     case 2:
       info->bytes_per_line = 8;
-      (*info->fprintf_func) (info->stream, ".short\t0x%04llx",
+      (*info->fprintf_styled_func) (info->stream, dis_style_text, ".short\t0x%04llx",
 			     (unsigned long long) data);
       break;
     case 4:
       info->bytes_per_line = 8;
-      (*info->fprintf_func) (info->stream, ".word\t0x%08llx",
+      (*info->fprintf_styled_func) (info->stream, dis_style_text, ".word\t0x%08llx",
 			     (unsigned long long) data);
       break;
     case 8:
       info->bytes_per_line = 8;
-      (*info->fprintf_func) (info->stream, ".dword\t0x%016llx",
+      (*info->fprintf_styled_func) (info->stream, dis_style_text, ".dword\t0x%016llx",
 			     (unsigned long long) data);
       break;
     default:


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] opcodes/i386: partially implement disassembler style support
  2022-02-18  7:14           ` Jan Beulich
@ 2022-02-19 10:54             ` Andrew Burgess
  2022-02-21 13:08               ` Jan Beulich
  0 siblings, 1 reply; 21+ messages in thread
From: Andrew Burgess @ 2022-02-19 10:54 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils

Jan Beulich via Binutils <binutils@sourceware.org> writes:

> On 17.02.2022 23:37, Andrew Burgess wrote:
>> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>>> On 17.02.2022 17:15, Andrew Burgess wrote:
>>>> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>>>>> On 16.02.2022 21:53, Andrew Burgess via Binutils wrote:
>>>>>> +	      (*ins->info->fprintf_styled_func)
>>>>>> +		(ins->info->stream, dis_style_text, " ");
>>>>>> +	      (*ins->info->fprintf_styled_func)
>>>>>> +		(ins->info->stream, dis_style_immediate, "0x%x",
>>>>>> +		 (unsigned int) priv.the_buffer[0]);
>>>>>
>>>>> I wonder if the naming (dis_style_immediate) isn't misleading. As per
>>>>> the comment next to its definition it really appears to mean any kind
>>>>> of number (like is the case here), not just immediate operands of
>>>>> instructions. Hence maybe dis_style_number (as replacement for or in
>>>>> addition to dis_style_immediate)?
>>>>
>>>> You mentioned this before in the previous thread, and I didn't really
>>>> understand then either.
>>>>
>>>> Can you give an example of something that's a number, but not an
>>>> immediate?  e.g. I wonder (given the instruction/directive distinction
>>>> you draw above), I wonder if you're conserned about: '.byte 0x4', maybe
>>>> you don't like referring to this 0x4 here as an immediate?
>>>
>>> Well, an operand to a directive for example is not an immediate imo,
>>> yes. A "load offset" (as your comment calls it) may also not be an
>>> immediate. E.g. in x86 memory access instructions:
>>>
>>> 	mov	0x10(%rbx), %eax
>>>
>>> the 0x10 isn't an immediate, but a displacement. The difference may
>>> be more relevant in something like
>>>
>>> 	mov	$0, 0x10(%rbx)
>>>
>>> where the $0 is an immediate operand, but the 0x10 isn't (and you
>>> wouldn't want to mix the two).
>>>
>>> From that comment it's not clear to me where else you would think
>>> "immediate" applies (or not), but in RISC-V's
>>>
>>> 	lw	x0, 0x10(x0)
>>>
>>> I wouldn't consider the 0x10 an immediate either, albeit this may
>>> be a result of my x86 bias.
>> 
>> I wonder if there's a name we could come up with that would allow me to
>> classify the '$0' and '0x10' (in your example above) as the same style?
>> 
>> I've kind-of lost the thread a bit, but maybe that's what the 'number'
>> you suggested original was for?  If I replaced dis_style_immediate with
>> dis_style_number, and just replaced thoughout, would that be less
>> problematic?
>
> Yes, "number" was meant to be a possible replacement. Whether it's
> helpful to style all forms of numbers the same is questionable
> though. Note that to me in "$0" the '$' would not be covered by
> "number" then, while it would be covered by "immediate".
>
>> Another possibility would be to have some aliases either in the original
>> enum, as in:
>> 
>>   dis_style_displacement = dis_style_immediate,
>> 
>> or even at the top of i386-dis.c, as in:
>> 
>>   #define dis_style_displacement dis_style_immediate
>
> But that's wrong. I'm not primarily after the naming in the sources,
> but after the output not showing distinct things as similar.
>
>> I really think we should avoid adding too many distinct styles if we
>> can.  My concern is less about disassembler users handling the different
>> styles, and more about consistency between the disassemblers.  I figure
>> it's easier to be consistent if we only have a small number of styles.
>> If displacement is a different style to other immediate (yes, I'm still
>> going to call numbers in instruction immediates!) then we end up with
>> some architectures going one way, and others another.
>
> I agree with the goal of limiting the number of styles. But instead
> of marking distinct items with similar non-basic (text) styles, I'd
> rather see items left alone then. IOW with dis_style_immediate I'd
> prefer (in the x86 example) to see only true immediate insn operands
> be tagged that way, and all other numbers to remain dis_style_text.

Having reviewed the thread, I've tried to come up with the complete list
of styles that I believe your are arguing for.  These incude a new
directive style, and an address_offset style.  I've extended the text in
several places to cover how to handle prefixes, as well as how styles
should be used for directives.

I'd be grateful if you could read through this list, and give any
examples of things which, if styled using the rules below, you feel
would be unacceptable.


  /* This is the default style, use this for any additional syntax
     (e.g. commas between operands, brackets, etc), or just as a default if
     no other style seems appropriate.  */
  dis_style_text,

  /* Use this for all instruction mnemonics, or aliases for mnemonics.
     These should be things that correspond to real machine
     instructions.  */
  dis_style_mnemonic,

  /* For things that aren't real machine instructions, but rather
     assembler directives, e.g. .byte, etc.  */
  dis_style_assembler_directive,

  /* Use this for any register names.  This may or may-not include any
     register prefix, e.g. '$', '%', at the discretion of the target,
     though within each target the choice to include prefixes for not
     should be kept consistent.  If the prefix is not printed with this
     style, then dis_style_text should be used.  */
  dis_style_register,

  /* Use this for any constant values used within instructions or
     directives, unless the value is an absolute address, or an offset
     that will be added to an address (no matter where the address comes
     from) before use.  This style may, or may-not be used for any
     prefix to the immediate value, e.g. '$', at the discretion of the
     target, though within each target the choice to include these
     prefixes should be kept consistent.  */
  dis_style_immediate,

  /* The style for the numerical representation of an absolute address.
     Anything that is an address offset should use the immediate style.
     This style may, or may-not be used for any prefix to the immediate
     value, e.g. '$', at the discretion of the target, though within
     each target the choice to include these prefixes should be kept
     consistent.  */
  dis_style_address,

  /* The style for any constant value within an instruction or directive
     that represents an offset that will be added to an address before
     use.  This style may, or may-not be used for any prefix to the
     immediate value, e.g. '$', at the discretion of the target, though
     within each target the choice to include these prefixes should be
     kept consistent.  */
  dis_style_address_offset,

  /* The style for a symbol's name.  The numerical address of a symbol
     should use the address style above, this style is reserved for the
     name.  */
  dis_style_symbol,

  /* The start of a comment that runs to the end of the line.  Anything
     printed after a comment start might be styled differently,
     e.g. everything might be styled as a comment, regardless of the
     actual style used.  The disassembler itself should not try to adjust
     the style emitted for comment content, e.g. an address emitted within
     a comment should still be given dis_style_address, in this way it is
     up to the user of the disassembler to decide how comments should be
     styled.  */
  dis_style_comment_start


Thanks,
Andrew


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] opcodes/i386: partially implement disassembler style support
  2022-02-19 10:54             ` Andrew Burgess
@ 2022-02-21 13:08               ` Jan Beulich
  2022-02-21 18:01                 ` Andrew Burgess
  0 siblings, 1 reply; 21+ messages in thread
From: Jan Beulich @ 2022-02-21 13:08 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: binutils

On 19.02.2022 11:54, Andrew Burgess wrote:
> Jan Beulich via Binutils <binutils@sourceware.org> writes:
> 
>> On 17.02.2022 23:37, Andrew Burgess wrote:
>>> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>>>> On 17.02.2022 17:15, Andrew Burgess wrote:
>>>>> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>>>>>> On 16.02.2022 21:53, Andrew Burgess via Binutils wrote:
>>>>>>> +	      (*ins->info->fprintf_styled_func)
>>>>>>> +		(ins->info->stream, dis_style_text, " ");
>>>>>>> +	      (*ins->info->fprintf_styled_func)
>>>>>>> +		(ins->info->stream, dis_style_immediate, "0x%x",
>>>>>>> +		 (unsigned int) priv.the_buffer[0]);
>>>>>>
>>>>>> I wonder if the naming (dis_style_immediate) isn't misleading. As per
>>>>>> the comment next to its definition it really appears to mean any kind
>>>>>> of number (like is the case here), not just immediate operands of
>>>>>> instructions. Hence maybe dis_style_number (as replacement for or in
>>>>>> addition to dis_style_immediate)?
>>>>>
>>>>> You mentioned this before in the previous thread, and I didn't really
>>>>> understand then either.
>>>>>
>>>>> Can you give an example of something that's a number, but not an
>>>>> immediate?  e.g. I wonder (given the instruction/directive distinction
>>>>> you draw above), I wonder if you're conserned about: '.byte 0x4', maybe
>>>>> you don't like referring to this 0x4 here as an immediate?
>>>>
>>>> Well, an operand to a directive for example is not an immediate imo,
>>>> yes. A "load offset" (as your comment calls it) may also not be an
>>>> immediate. E.g. in x86 memory access instructions:
>>>>
>>>> 	mov	0x10(%rbx), %eax
>>>>
>>>> the 0x10 isn't an immediate, but a displacement. The difference may
>>>> be more relevant in something like
>>>>
>>>> 	mov	$0, 0x10(%rbx)
>>>>
>>>> where the $0 is an immediate operand, but the 0x10 isn't (and you
>>>> wouldn't want to mix the two).
>>>>
>>>> From that comment it's not clear to me where else you would think
>>>> "immediate" applies (or not), but in RISC-V's
>>>>
>>>> 	lw	x0, 0x10(x0)
>>>>
>>>> I wouldn't consider the 0x10 an immediate either, albeit this may
>>>> be a result of my x86 bias.
>>>
>>> I wonder if there's a name we could come up with that would allow me to
>>> classify the '$0' and '0x10' (in your example above) as the same style?
>>>
>>> I've kind-of lost the thread a bit, but maybe that's what the 'number'
>>> you suggested original was for?  If I replaced dis_style_immediate with
>>> dis_style_number, and just replaced thoughout, would that be less
>>> problematic?
>>
>> Yes, "number" was meant to be a possible replacement. Whether it's
>> helpful to style all forms of numbers the same is questionable
>> though. Note that to me in "$0" the '$' would not be covered by
>> "number" then, while it would be covered by "immediate".
>>
>>> Another possibility would be to have some aliases either in the original
>>> enum, as in:
>>>
>>>   dis_style_displacement = dis_style_immediate,
>>>
>>> or even at the top of i386-dis.c, as in:
>>>
>>>   #define dis_style_displacement dis_style_immediate
>>
>> But that's wrong. I'm not primarily after the naming in the sources,
>> but after the output not showing distinct things as similar.
>>
>>> I really think we should avoid adding too many distinct styles if we
>>> can.  My concern is less about disassembler users handling the different
>>> styles, and more about consistency between the disassemblers.  I figure
>>> it's easier to be consistent if we only have a small number of styles.
>>> If displacement is a different style to other immediate (yes, I'm still
>>> going to call numbers in instruction immediates!) then we end up with
>>> some architectures going one way, and others another.
>>
>> I agree with the goal of limiting the number of styles. But instead
>> of marking distinct items with similar non-basic (text) styles, I'd
>> rather see items left alone then. IOW with dis_style_immediate I'd
>> prefer (in the x86 example) to see only true immediate insn operands
>> be tagged that way, and all other numbers to remain dis_style_text.
> 
> Having reviewed the thread, I've tried to come up with the complete list
> of styles that I believe your are arguing for.  These incude a new
> directive style, and an address_offset style.  I've extended the text in
> several places to cover how to handle prefixes, as well as how styles
> should be used for directives.
> 
> I'd be grateful if you could read through this list, and give any
> examples of things which, if styled using the rules below, you feel
> would be unacceptable.

I think this all looks good (but of course once this can actually
be seen in use, more things may pop up), provided ...

>   /* This is the default style, use this for any additional syntax
>      (e.g. commas between operands, brackets, etc), or just as a default if
>      no other style seems appropriate.  */
>   dis_style_text,
> 
>   /* Use this for all instruction mnemonics, or aliases for mnemonics.
>      These should be things that correspond to real machine
>      instructions.  */
>   dis_style_mnemonic,
> 
>   /* For things that aren't real machine instructions, but rather
>      assembler directives, e.g. .byte, etc.  */
>   dis_style_assembler_directive,
> 
>   /* Use this for any register names.  This may or may-not include any
>      register prefix, e.g. '$', '%', at the discretion of the target,
>      though within each target the choice to include prefixes for not
>      should be kept consistent.  If the prefix is not printed with this
>      style, then dis_style_text should be used.  */
>   dis_style_register,
> 
>   /* Use this for any constant values used within instructions or
>      directives, unless the value is an absolute address, or an offset
>      that will be added to an address (no matter where the address comes
>      from) before use.  This style may, or may-not be used for any
>      prefix to the immediate value, e.g. '$', at the discretion of the
>      target, though within each target the choice to include these
>      prefixes should be kept consistent.  */
>   dis_style_immediate,
> 
>   /* The style for the numerical representation of an absolute address.
>      Anything that is an address offset should use the immediate style.
>      This style may, or may-not be used for any prefix to the immediate
>      value, e.g. '$', at the discretion of the target, though within
>      each target the choice to include these prefixes should be kept
>      consistent.  */
>   dis_style_address,
> 
>   /* The style for any constant value within an instruction or directive
>      that represents an offset that will be added to an address before
>      use.  This style may, or may-not be used for any prefix to the
>      immediate value, e.g. '$', at the discretion of the target, though
>      within each target the choice to include these prefixes should be
>      kept consistent.  */
>   dis_style_address_offset,

... it was more a copy-n-paste mistake to repeat the reference to $ etc
in these latter two? Or is this to cover e.g. Arm prefixing numbers by
# in a wider fashion than x86's use of $? In this case, using # as the
example character may avoid some confusion.

>   /* The style for a symbol's name.  The numerical address of a symbol
>      should use the address style above, this style is reserved for the
>      name.  */
>   dis_style_symbol,

There may be a remaining ambiguity here: What is the intended style to
be used for <symbol>+<offset>? Just dis_style_symbol or first
dis_style_symbol and then dis_style_address_offset?

Jan


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 3/3] opcodes/i386: partially implement disassembler style support
  2022-02-21 13:08               ` Jan Beulich
@ 2022-02-21 18:01                 ` Andrew Burgess
  0 siblings, 0 replies; 21+ messages in thread
From: Andrew Burgess @ 2022-02-21 18:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils

Jan Beulich via Binutils <binutils@sourceware.org> writes:

> On 19.02.2022 11:54, Andrew Burgess wrote:
>> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>> 
>>> On 17.02.2022 23:37, Andrew Burgess wrote:
>>>> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>>>>> On 17.02.2022 17:15, Andrew Burgess wrote:
>>>>>> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>>>>>>> On 16.02.2022 21:53, Andrew Burgess via Binutils wrote:
>>>>>>>> +	      (*ins->info->fprintf_styled_func)
>>>>>>>> +		(ins->info->stream, dis_style_text, " ");
>>>>>>>> +	      (*ins->info->fprintf_styled_func)
>>>>>>>> +		(ins->info->stream, dis_style_immediate, "0x%x",
>>>>>>>> +		 (unsigned int) priv.the_buffer[0]);
>>>>>>>
>>>>>>> I wonder if the naming (dis_style_immediate) isn't misleading. As per
>>>>>>> the comment next to its definition it really appears to mean any kind
>>>>>>> of number (like is the case here), not just immediate operands of
>>>>>>> instructions. Hence maybe dis_style_number (as replacement for or in
>>>>>>> addition to dis_style_immediate)?
>>>>>>
>>>>>> You mentioned this before in the previous thread, and I didn't really
>>>>>> understand then either.
>>>>>>
>>>>>> Can you give an example of something that's a number, but not an
>>>>>> immediate?  e.g. I wonder (given the instruction/directive distinction
>>>>>> you draw above), I wonder if you're conserned about: '.byte 0x4', maybe
>>>>>> you don't like referring to this 0x4 here as an immediate?
>>>>>
>>>>> Well, an operand to a directive for example is not an immediate imo,
>>>>> yes. A "load offset" (as your comment calls it) may also not be an
>>>>> immediate. E.g. in x86 memory access instructions:
>>>>>
>>>>> 	mov	0x10(%rbx), %eax
>>>>>
>>>>> the 0x10 isn't an immediate, but a displacement. The difference may
>>>>> be more relevant in something like
>>>>>
>>>>> 	mov	$0, 0x10(%rbx)
>>>>>
>>>>> where the $0 is an immediate operand, but the 0x10 isn't (and you
>>>>> wouldn't want to mix the two).
>>>>>
>>>>> From that comment it's not clear to me where else you would think
>>>>> "immediate" applies (or not), but in RISC-V's
>>>>>
>>>>> 	lw	x0, 0x10(x0)
>>>>>
>>>>> I wouldn't consider the 0x10 an immediate either, albeit this may
>>>>> be a result of my x86 bias.
>>>>
>>>> I wonder if there's a name we could come up with that would allow me to
>>>> classify the '$0' and '0x10' (in your example above) as the same style?
>>>>
>>>> I've kind-of lost the thread a bit, but maybe that's what the 'number'
>>>> you suggested original was for?  If I replaced dis_style_immediate with
>>>> dis_style_number, and just replaced thoughout, would that be less
>>>> problematic?
>>>
>>> Yes, "number" was meant to be a possible replacement. Whether it's
>>> helpful to style all forms of numbers the same is questionable
>>> though. Note that to me in "$0" the '$' would not be covered by
>>> "number" then, while it would be covered by "immediate".
>>>
>>>> Another possibility would be to have some aliases either in the original
>>>> enum, as in:
>>>>
>>>>   dis_style_displacement = dis_style_immediate,
>>>>
>>>> or even at the top of i386-dis.c, as in:
>>>>
>>>>   #define dis_style_displacement dis_style_immediate
>>>
>>> But that's wrong. I'm not primarily after the naming in the sources,
>>> but after the output not showing distinct things as similar.
>>>
>>>> I really think we should avoid adding too many distinct styles if we
>>>> can.  My concern is less about disassembler users handling the different
>>>> styles, and more about consistency between the disassemblers.  I figure
>>>> it's easier to be consistent if we only have a small number of styles.
>>>> If displacement is a different style to other immediate (yes, I'm still
>>>> going to call numbers in instruction immediates!) then we end up with
>>>> some architectures going one way, and others another.
>>>
>>> I agree with the goal of limiting the number of styles. But instead
>>> of marking distinct items with similar non-basic (text) styles, I'd
>>> rather see items left alone then. IOW with dis_style_immediate I'd
>>> prefer (in the x86 example) to see only true immediate insn operands
>>> be tagged that way, and all other numbers to remain dis_style_text.
>> 
>> Having reviewed the thread, I've tried to come up with the complete list
>> of styles that I believe your are arguing for.  These incude a new
>> directive style, and an address_offset style.  I've extended the text in
>> several places to cover how to handle prefixes, as well as how styles
>> should be used for directives.
>> 
>> I'd be grateful if you could read through this list, and give any
>> examples of things which, if styled using the rules below, you feel
>> would be unacceptable.
>
> I think this all looks good (but of course once this can actually
> be seen in use, more things may pop up), provided ...

For sure.  But I think this conversation has been really constructive to
address some early issues.

>
>>   /* This is the default style, use this for any additional syntax
>>      (e.g. commas between operands, brackets, etc), or just as a default if
>>      no other style seems appropriate.  */
>>   dis_style_text,
>> 
>>   /* Use this for all instruction mnemonics, or aliases for mnemonics.
>>      These should be things that correspond to real machine
>>      instructions.  */
>>   dis_style_mnemonic,
>> 
>>   /* For things that aren't real machine instructions, but rather
>>      assembler directives, e.g. .byte, etc.  */
>>   dis_style_assembler_directive,
>> 
>>   /* Use this for any register names.  This may or may-not include any
>>      register prefix, e.g. '$', '%', at the discretion of the target,
>>      though within each target the choice to include prefixes for not
>>      should be kept consistent.  If the prefix is not printed with this
>>      style, then dis_style_text should be used.  */
>>   dis_style_register,
>> 
>>   /* Use this for any constant values used within instructions or
>>      directives, unless the value is an absolute address, or an offset
>>      that will be added to an address (no matter where the address comes
>>      from) before use.  This style may, or may-not be used for any
>>      prefix to the immediate value, e.g. '$', at the discretion of the
>>      target, though within each target the choice to include these
>>      prefixes should be kept consistent.  */
>>   dis_style_immediate,
>> 
>>   /* The style for the numerical representation of an absolute address.
>>      Anything that is an address offset should use the immediate style.
>>      This style may, or may-not be used for any prefix to the immediate
>>      value, e.g. '$', at the discretion of the target, though within
>>      each target the choice to include these prefixes should be kept
>>      consistent.  */
>>   dis_style_address,
>> 
>>   /* The style for any constant value within an instruction or directive
>>      that represents an offset that will be added to an address before
>>      use.  This style may, or may-not be used for any prefix to the
>>      immediate value, e.g. '$', at the discretion of the target, though
>>      within each target the choice to include these prefixes should be
>>      kept consistent.  */
>>   dis_style_address_offset,
>
> ... it was more a copy-n-paste mistake to repeat the reference to $ etc
> in these latter two? Or is this to cover e.g. Arm prefixing numbers by
> # in a wider fashion than x86's use of $? In this case, using # as the
> example character may avoid some confusion.

I didn't have any particular architecture in mind, I just wanted to
maintain a symmetry with dis_style_immediate (that there might be a
prefix, and it could be given this style, if that's the preference).  As
you suggest, I'll update the text to use a different character so it
look less like a copy & paste error.

>
>>   /* The style for a symbol's name.  The numerical address of a symbol
>>      should use the address style above, this style is reserved for the
>>      name.  */
>>   dis_style_symbol,
>
> There may be a remaining ambiguity here: What is the intended style to
> be used for <symbol>+<offset>? Just dis_style_symbol or first
> dis_style_symbol and then dis_style_address_offset?

Yes, I'd expect a mix of dis_style_symbol and dis_style_address_offset
in this case.  I'll expand the comment to explicitly mention this case.

I'll merge these fixes locally, and wait to see if anyone else has any
feedback before I repost this series.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCH 1/3] objdump/opcodes: add syntax highlighting to disassembler output
  2022-02-16 20:53 ` [PATCH 1/3] objdump/opcodes: add syntax highlighting to disassembler output Andrew Burgess
@ 2022-02-28 15:54   ` Tom Tromey
  0 siblings, 0 replies; 21+ messages in thread
From: Tom Tromey @ 2022-02-28 15:54 UTC (permalink / raw)
  To: Andrew Burgess via Binutils; +Cc: Andrew Burgess

>>>>> "Andrew" == Andrew Burgess via Binutils <binutils@sourceware.org> writes:

Andrew> I do have an idea about using possibly an environment variable to
Andrew> allow the objdump colors to be customised, but I haven't done anything
Andrew> like that in this commit, the color choices are just fixed in the code
Andrew> for now.

For this, there's some prior art in coreutils and gcc.
There's also GNU libtextstyle, but that may be overkill.

Tom

^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCHv2 0/3] disassembler syntax highlighting in objdump (via libopcodes)
  2022-02-16 20:53 [PATCH 0/3] disassembler syntax highlighting in objdump (via libopcodes) Andrew Burgess
                   ` (3 preceding siblings ...)
  2022-02-17  3:57 ` [PATCH 0/3] disassembler syntax highlighting in objdump (via libopcodes) Nelson Chu
@ 2022-03-21 14:33 ` Andrew Burgess
  2022-03-21 14:33   ` [PATCHv2 1/3] objdump/opcodes: add syntax highlighting to disassembler output Andrew Burgess
                     ` (3 more replies)
  4 siblings, 4 replies; 21+ messages in thread
From: Andrew Burgess @ 2022-03-21 14:33 UTC (permalink / raw)
  To: binutils; +Cc: Andrew Burgess

This series is a serious attempt at what I discussed here:

  https://sourceware.org/pipermail/binutils/2021-December/118806.html

This series changes libopcodes so that this disassemblers can supply
styling information with every piece of disassembly output, e.g. is
this a register?  an address?  a mnemonic?  etc.

Users of the disassembler can then choose to make use of this
information to add styling to the disassembler output.

And that is what I do for objdump in this series.  The styling is off
by default, but can be turned on with a new command line flag:
    --disassembler-color=off|color|extended-color

I've updated GDB enough to keep it building and running after this
change, though at this point GDB doesn't make use of the new styling
information, that will come later.

All feedback would be welcome.

Changes since v1:

  - After discussion with Jan I've now added additional disassembler
    styles (see enum disassembler_style in includes/dis-asm.h),

  - I've updated the riscv-dis.c and i386-dis.c to make use of these
    new styles, and fixed a few places (mostly in riscv-dis.c) where
    the wrong style was being used,

  - I've gone through my changes in riscv-dis.c and i386-dis.c and
    made sure that lines are all under 80 characters,

  - I've extended binutils/objdump.c to handle the new styles.  For
    now I've not given these styles separate colours, but used
    existing grouping similar styles together, e.g. anything that is a
    number (immediate, address, address offset) all gets the same
    colour.

  - The crash that Nelson reported from riscv-dis.c is fixed.

Andrew Burgess (3):
  objdump/opcodes: add syntax highlighting to disassembler output
  opcodes/riscv: implement style support in the disassembler
  opcodes/i386: partially implement disassembler style support

 binutils/NEWS              |   4 +
 binutils/doc/binutils.texi |  11 ++
 binutils/objdump.c         | 249 ++++++++++++++++++++++++++++++++-----
 gdb/disasm.c               |  34 ++++-
 gdb/disasm.h               |   7 ++
 include/dis-asm.h          |  88 ++++++++++++-
 opcodes/dis-init.c         |   5 +-
 opcodes/disassemble.c      |  23 +++-
 opcodes/i386-dis.c         |  63 ++++++----
 opcodes/riscv-dis.c        | 193 +++++++++++++++++-----------
 10 files changed, 541 insertions(+), 136 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCHv2 1/3] objdump/opcodes: add syntax highlighting to disassembler output
  2022-03-21 14:33 ` [PATCHv2 " Andrew Burgess
@ 2022-03-21 14:33   ` Andrew Burgess
  2022-03-21 14:33   ` [PATCHv2 2/3] opcodes/riscv: implement style support in the disassembler Andrew Burgess
                     ` (2 subsequent siblings)
  3 siblings, 0 replies; 21+ messages in thread
From: Andrew Burgess @ 2022-03-21 14:33 UTC (permalink / raw)
  To: binutils; +Cc: Andrew Burgess

This commit adds the _option_ of having disassembler output syntax
highlighted in objdump.  This option is _off_ by default.  The new
command line options are:

  --disassembler-color=off		# The default.
  --disassembler-color=color
  --disassembler-color=extended-color

I have implemented two colour modes, using the same option names as we
use of --visualize-jumps, a basic 8-color mode ("color"), and an
extended 8bit color mode ("extended-color").

The syntax highlighting requires that each targets disassembler be
updated; each time the disassembler produces some output we now pass
through an additional parameter indicating what style should be
applied to the text.

As updating all target disassemblers is a large task, the old API is
maintained.  And so, a user of the disassembler (i.e. objdump, gdb)
must provide two functions, the current non-styled print function, and
a new, styled print function.

I don't currently have a plan for converting every single target
disassembler, my hope is that interested folk will update the
disassemblers they are interested in.  But it is possible some might
never get updated.

In this initial series I intend to convert the RISC-V disassembler
completely, and also do a partial conversion of the x86 disassembler.
Hopefully having the x86 disassembler at least partial converted will
allow more people to try this out easily and provide feedback.

In this commit I have focused on objdump.  The changes to GDB at this
point are the bare minimum required to get things compiling, GDB makes
no use of the styling information to provide any colors, that will
come later, if this commit is accepted.

This first commit in the series doesn't convert any target
disassemblers at all (the next two commits will update some targets),
so after this commit, the only color you will see in the disassembler
output, is that produced from objdump itself, e.g. from
objdump_print_addr_with_sym, where we print an address and a symbol
name, these are now printed with styling information, and so will have
colors applied (if the option is on).

Finally, my ability to pick "good" colors is ... well, terrible.  I'm
in no way committed to the colors I've picked here, so I encourage
people to suggest new colors, or wait for this commit to land, and
then patch the choice of colors.

I do have an idea about using possibly an environment variable to
allow the objdump colors to be customised, but I haven't done anything
like that in this commit, the color choices are just fixed in the code
for now.

binutils/ChangeLog:

	* NEWS: Mention new feature.
	* doc/binutils.texi (objdump): Describe --disassembler-color
	option.
	* objdump.c (disassembler_color): New global.
	(disassembler_extended_color): Likewise.
	(disassembler_in_comment): Likewise.
	(usage): Mention --disassembler-color option.
	(long_options): Add --disassembler-color option.
	(objdump_print_value): Use fprintf_styled_func instead of
	fprintf_func.
	(objdump_print_symname): Likewise.
	(objdump_print_addr_with_sym): Likewise.
	(objdump_color_for_disassembler_style): New function.
	(objdump_styled_sprintf): New function.
	(fprintf_styled): New function.
	(disassemble_jumps): Use disassemble_set_printf, and reset
	disassembler_in_comment.
	(null_styled_print): New function.
	(disassemble_bytes): Use disassemble_set_printf, and reset
	disassembler_in_comment.
	(disassemble_data): Update init_disassemble_info call.
	(main): Handle --disassembler-color option.

include/ChangeLog:

	* dis-asm.h (enum disassembler_style): New enum.
	(struct disassemble_info): Add fprintf_styled_func field, and
	created_styled_output field.
	(disassemble_set_printf): Declare.
	(init_disassemble_info): Add additional parameter.
	(INIT_DISASSEMBLE_INFO): Add additional parameter.

opcodes/ChangeLog:

	* dis-init.c (init_disassemble_info): Take extra parameter,
	initialize the new fprintf_styled_func and created_styled_output
	fields.
	* disassembler.c (disassemble_set_printf): New function definition.
---
 binutils/NEWS              |   4 +
 binutils/doc/binutils.texi |  11 ++
 binutils/objdump.c         | 249 ++++++++++++++++++++++++++++++++-----
 gdb/disasm.c               |  34 ++++-
 gdb/disasm.h               |   7 ++
 include/dis-asm.h          |  88 ++++++++++++-
 opcodes/dis-init.c         |   5 +-
 opcodes/disassemble.c      |  13 ++
 8 files changed, 371 insertions(+), 40 deletions(-)

diff --git a/binutils/NEWS b/binutils/NEWS
index 7e3c005f25c..c266bb9c3de 100644
--- a/binutils/NEWS
+++ b/binutils/NEWS
@@ -5,6 +5,10 @@
 * objcopy --weaken, --weaken-symbol, and --weaken-symbols now make ELF
   STB_GNU_UNIQUE symbols weak.
 
+* objdump now supports syntax highlighting of disassembler output for some
+  architectures.  Use the --disassembler-color=MODE command line flag, with
+  mode being either off, color, or extended-color.
+
 Changes in 2.38:
 
 * elfedit: Add --output-abiversion option to update ABIVERSION.
diff --git a/binutils/doc/binutils.texi b/binutils/doc/binutils.texi
index c73837ee27b..2c234c682aa 100644
--- a/binutils/doc/binutils.texi
+++ b/binutils/doc/binutils.texi
@@ -2270,6 +2270,7 @@
         [@option{--prefix-strip=}@var{level}]
         [@option{--insn-width=}@var{width}]
         [@option{--visualize-jumps[=color|=extended-color|=off]}
+        [@option{--disassembler-color=[color|extended-color|off]}
         [@option{-U} @var{method}] [@option{--unicode=}@var{method}]
         [@option{-V}|@option{--version}]
         [@option{-H}|@option{--help}]
@@ -2807,6 +2808,16 @@
 after it has previously been enabled then use
 @option{visualize-jumps=off}.
 
+@item --disassembler-color=[color|extended-color|off]
+Apply syntax highlighting to the disassembler output.  The
+@option{color} argument adds color using simple terminal colors.
+Alternatively the @option{extended-color} argument will use 8bit
+colors, but these might not work on all terminals.
+
+If it is necessary to disable the @option{--disassembler-color} option
+after it has previously been enabled then use
+@option{--disassembler-color=off}.
+
 @item -W[lLiaprmfFsoORtUuTgAckK]
 @itemx --dwarf[=rawline,=decodedline,=info,=abbrev,=pubnames,=aranges,=macro,=frames,=frames-interp,=str,=str-offsets,=loc,=Ranges,=pubtypes,=trace_info,=trace_abbrev,=trace_aranges,=gdb_index,=addr,=cu_index,=links,=follow-links]
 @include debug.options.texi
diff --git a/binutils/objdump.c b/binutils/objdump.c
index 8e1c9cb0c21..54c89a32db2 100644
--- a/binutils/objdump.c
+++ b/binutils/objdump.c
@@ -130,10 +130,19 @@ static bool visualize_jumps = false;	/* --visualize-jumps.  */
 static bool color_output = false;	/* --visualize-jumps=color.  */
 static bool extended_color_output = false; /* --visualize-jumps=extended-color.  */
 static int process_links = false;       /* --process-links.  */
+static bool disassembler_color = false; /* --disassembler-color=color.  */
+static bool disassembler_extended_color = false; /* --disassembler-color=extended-color.  */
 
 static int dump_any_debugging;
 static int demangle_flags = DMGL_ANSI | DMGL_PARAMS;
 
+/* This is reset to false each time we enter the disassembler, and set true
+   when the disassembler emits something in the dis_style_comment_start
+   style.  Once this is true, all further output on that line is done in
+   the comment style.  This only has an effect when disassembler coloring
+   is turned on.  */
+static bool disassembler_in_comment = false;
+
 /* A structure to record the sections mentioned in -j switches.  */
 struct only
 {
@@ -396,6 +405,10 @@ usage (FILE *stream, int status)
                                  Use extended 8-bit color codes\n"));
       fprintf (stream, _("\
       --visualize-jumps=off      Disable jump visualization\n\n"));
+      fprintf (stream, _("\
+      --disassembler-color=off   Disable disassembler color output.\n\n"));
+      fprintf (stream, _("\
+      --disassembler-color=color Use basic colors in disassembler output.\n\n"));
 
       list_supported_targets (program_name, stream);
       list_supported_architectures (program_name, stream);
@@ -437,7 +450,8 @@ enum option_values
     OPTION_CTF,
     OPTION_CTF_PARENT,
 #endif
-    OPTION_VISUALIZE_JUMPS
+    OPTION_VISUALIZE_JUMPS,
+    OPTION_DISASSEMBLER_COLOR
   };
 
 static struct option long_options[]=
@@ -503,6 +517,7 @@ static struct option long_options[]=
   {"version", no_argument, NULL, 'V'},
   {"visualize-jumps", optional_argument, 0, OPTION_VISUALIZE_JUMPS},
   {"wide", no_argument, NULL, 'w'},
+  {"disassembler-color", required_argument, NULL, OPTION_DISASSEMBLER_COLOR},
   {NULL, no_argument, NULL, 0}
 };
 \f
@@ -1246,7 +1261,7 @@ objdump_print_value (bfd_vma vma, struct disassemble_info *inf,
       if (*p == '\0')
 	--p;
     }
-  (*inf->fprintf_func) (inf->stream, "%s", p);
+  (*inf->fprintf_styled_func) (inf->stream, dis_style_address, "%s", p);
 }
 
 /* Print the name of a symbol.  */
@@ -1280,10 +1295,11 @@ objdump_print_symname (bfd *abfd, struct disassemble_info *inf,
 
   if (inf != NULL)
     {
-      (*inf->fprintf_func) (inf->stream, "%s", name);
+      (*inf->fprintf_styled_func) (inf->stream, dis_style_symbol, "%s", name);
       if (version_string && *version_string != '\0')
-	(*inf->fprintf_func) (inf->stream, hidden ? "@%s" : "@@%s",
-			      version_string);
+	(*inf->fprintf_styled_func) (inf->stream, dis_style_symbol,
+				     hidden ? "@%s" : "@@%s",
+				     version_string);
     }
   else
     {
@@ -1545,31 +1561,33 @@ objdump_print_addr_with_sym (bfd *abfd, asection *sec, asymbol *sym,
   if (!no_addresses)
     {
       objdump_print_value (vma, inf, skip_zeroes);
-      (*inf->fprintf_func) (inf->stream, " ");
+      (*inf->fprintf_styled_func) (inf->stream, dis_style_text, " ");
     }
 
   if (sym == NULL)
     {
       bfd_vma secaddr;
 
-      (*inf->fprintf_func) (inf->stream, "<%s",
-			    sanitize_string (bfd_section_name (sec)));
+      (*inf->fprintf_styled_func) (inf->stream, dis_style_text,"<");
+      (*inf->fprintf_styled_func) (inf->stream, dis_style_symbol, "%s",
+				   sanitize_string (bfd_section_name (sec)));
       secaddr = bfd_section_vma (sec);
       if (vma < secaddr)
 	{
-	  (*inf->fprintf_func) (inf->stream, "-0x");
+	  (*inf->fprintf_styled_func) (inf->stream, dis_style_immediate,
+				       "-0x");
 	  objdump_print_value (secaddr - vma, inf, true);
 	}
       else if (vma > secaddr)
 	{
-	  (*inf->fprintf_func) (inf->stream, "+0x");
+	  (*inf->fprintf_styled_func) (inf->stream, dis_style_immediate, "+0x");
 	  objdump_print_value (vma - secaddr, inf, true);
 	}
-      (*inf->fprintf_func) (inf->stream, ">");
+      (*inf->fprintf_styled_func) (inf->stream, dis_style_text, ">");
     }
   else
     {
-      (*inf->fprintf_func) (inf->stream, "<");
+      (*inf->fprintf_styled_func) (inf->stream, dis_style_text, "<");
 
       objdump_print_symname (abfd, inf, sym);
 
@@ -1586,21 +1604,22 @@ objdump_print_addr_with_sym (bfd *abfd, asection *sec, asymbol *sym,
 	;
       else if (bfd_asymbol_value (sym) > vma)
 	{
-	  (*inf->fprintf_func) (inf->stream, "-0x");
+	  (*inf->fprintf_styled_func) (inf->stream, dis_style_immediate,"-0x");
 	  objdump_print_value (bfd_asymbol_value (sym) - vma, inf, true);
 	}
       else if (vma > bfd_asymbol_value (sym))
 	{
-	  (*inf->fprintf_func) (inf->stream, "+0x");
+	  (*inf->fprintf_styled_func) (inf->stream, dis_style_immediate, "+0x");
 	  objdump_print_value (vma - bfd_asymbol_value (sym), inf, true);
 	}
 
-      (*inf->fprintf_func) (inf->stream, ">");
+      (*inf->fprintf_styled_func) (inf->stream, dis_style_text, ">");
     }
 
   if (display_file_offsets)
-    inf->fprintf_func (inf->stream, _(" (File Offset: 0x%lx)"),
-			(long int)(sec->filepos + (vma - sec->vma)));
+    inf->fprintf_styled_func (inf->stream, dis_style_text,
+			      _(" (File Offset: 0x%lx)"),
+			      (long int)(sec->filepos + (vma - sec->vma)));
 }
 
 /* Print an address (VMA), symbolically if possible.
@@ -2127,6 +2146,143 @@ objdump_sprintf (SFILE *f, const char *format, ...)
   return n;
 }
 
+/* Return an integer greater than, or equal to zero, representing the color
+   for STYLE, or -1 if no color should be used.  */
+
+static int
+objdump_color_for_disassembler_style (enum disassembler_style style)
+{
+  int color = -1;
+
+  if (style == dis_style_comment_start)
+    disassembler_in_comment = true;
+
+  if (disassembler_color)
+    {
+      if (disassembler_in_comment)
+	return color;
+
+      switch (style)
+	{
+	case dis_style_symbol: color = 32; break;
+        case dis_style_assembler_directive:
+	case dis_style_mnemonic: color = 33; break;
+	case dis_style_register: color = 34; break;
+	case dis_style_address:
+        case dis_style_address_offset:
+	case dis_style_immediate: color = 35; break;
+	default:
+	case dis_style_text: color = -1; break;
+	}
+    }
+  else if (disassembler_extended_color)
+    {
+      if (disassembler_in_comment)
+	return 250;
+
+      switch (style)
+	{
+	case dis_style_symbol: color = 40; break;
+        case dis_style_assembler_directive:
+	case dis_style_mnemonic: color = 142; break;
+	case dis_style_register: color = 27; break;
+	case dis_style_address:
+        case dis_style_address_offset:
+	case dis_style_immediate: color = 134; break;
+	default:
+	case dis_style_text: color = -1; break;
+	}
+    }
+
+  return color;
+}
+
+/* Like objdump_sprintf, but add in escape sequences to highlight the
+   content according to STYLE.  */
+
+static int ATTRIBUTE_PRINTF_3
+objdump_styled_sprintf (SFILE *f, enum disassembler_style style,
+			const char *format, ...)
+{
+  size_t n;
+  va_list args;
+  int color = objdump_color_for_disassembler_style (style);
+
+  if (color >= 0)
+    {
+      while (1)
+	{
+	  size_t space = f->alloc - f->pos;
+
+	  if (disassembler_color)
+	    n = snprintf (f->buffer + f->pos, space, "\033[%dm", color);
+	  else
+	    n = snprintf (f->buffer + f->pos, space, "\033[38;5;%dm", color);
+	  if (space > n)
+	    break;
+
+	  f->alloc = (f->alloc + n) * 2;
+	  f->buffer = (char *) xrealloc (f->buffer, f->alloc);
+	}
+      f->pos += n;
+    }
+
+  while (1)
+    {
+      size_t space = f->alloc - f->pos;
+
+      va_start (args, format);
+      n = vsnprintf (f->buffer + f->pos, space, format, args);
+      va_end (args);
+
+      if (space > n)
+	break;
+
+      f->alloc = (f->alloc + n) * 2;
+      f->buffer = (char *) xrealloc (f->buffer, f->alloc);
+    }
+  f->pos += n;
+
+  if (color >= 0)
+    {
+      while (1)
+	{
+	  size_t space = f->alloc - f->pos;
+
+	  n = snprintf (f->buffer + f->pos, space, "\033[0m");
+
+	  if (space > n)
+	    break;
+
+	  f->alloc = (f->alloc + n) * 2;
+	  f->buffer = (char *) xrealloc (f->buffer, f->alloc);
+	}
+      f->pos += n;
+    }
+
+  return n;
+}
+
+/* We discard the styling information here.  This function is only used
+   when objdump is printing auxiliary information, the symbol headers, and
+   disassembly address, or the bytes of the disassembled instruction.  We
+   don't (currently) apply styling to any of this stuff, so, for now, just
+   print the content with no additional style added.  */
+
+static int ATTRIBUTE_PRINTF_3
+fprintf_styled (FILE *f, enum disassembler_style style ATTRIBUTE_UNUSED,
+		const char *fmt, ...)
+{
+  int res;
+  va_list ap;
+
+  va_start (ap, fmt);
+  res = vfprintf (f, fmt, ap);
+  va_end (ap);
+
+  return res;
+}
+
 /* Code for generating (colored) diagrams of control flow start and end
    points.  */
 
@@ -2558,8 +2714,8 @@ disassemble_jumps (struct disassemble_info * inf,
   sfile.pos = 0;
 
   inf->insn_info_valid = 0;
-  inf->fprintf_func = (fprintf_ftype) objdump_sprintf;
-  inf->stream = &sfile;
+  disassemble_set_printf (inf, &sfile, (fprintf_ftype) objdump_sprintf,
+			  (fprintf_styled_ftype) objdump_styled_sprintf);
 
   addr_offset = start_offset;
   while (addr_offset < stop_offset)
@@ -2621,6 +2777,7 @@ disassemble_jumps (struct disassemble_info * inf,
 
       /* Extract jump information.  */
       inf->insn_info_valid = 0;
+      disassembler_in_comment = false;
       octets = (*disassemble_fn) (section->vma + addr_offset, inf);
       /* Test if a jump was detected.  */
       if (inf->insn_info_valid
@@ -2641,9 +2798,8 @@ disassemble_jumps (struct disassemble_info * inf,
       addr_offset += octets / opb;
     }
 
-  inf->fprintf_func = (fprintf_ftype) fprintf;
-  inf->stream = stdout;
-
+  disassemble_set_printf (inf, (void *) stdout, (fprintf_ftype) fprintf,
+			  (fprintf_styled_ftype) fprintf_styled);
   free (sfile.buffer);
 
   /* Merge jumps.  */
@@ -2736,6 +2892,17 @@ null_print (const void * stream ATTRIBUTE_UNUSED, const char * format ATTRIBUTE_
   return 1;
 }
 
+/* Like null_print, but takes the extra STYLE argument.  As this is not
+   going to print anything, the extra argument is just ignored.  */
+
+static int
+null_styled_print (const void * stream ATTRIBUTE_UNUSED,
+		   enum disassembler_style style ATTRIBUTE_UNUSED,
+		   const char * format ATTRIBUTE_UNUSED, ...)
+{
+  return 1;
+}
+
 /* Print out jump visualization.  */
 
 static void
@@ -2946,8 +3113,9 @@ disassemble_bytes (struct disassemble_info *inf,
 	      int insn_size;
 
 	      sfile.pos = 0;
-	      inf->fprintf_func = (fprintf_ftype) objdump_sprintf;
-	      inf->stream = &sfile;
+	      disassemble_set_printf
+		(inf, &sfile, (fprintf_ftype) objdump_sprintf,
+		 (fprintf_styled_ftype) objdump_styled_sprintf);
 	      inf->bytes_per_line = 0;
 	      inf->bytes_per_chunk = 0;
 	      inf->flags = ((disassemble_all ? DISASSEMBLE_DATA : 0)
@@ -2989,10 +3157,15 @@ disassemble_bytes (struct disassemble_info *inf,
 			     twice, but we only do this when there is a high
 			     probability that there is a reloc that will
 			     affect the instruction.  */
-			  inf->fprintf_func = (fprintf_ftype) null_print;
+			  disassemble_set_printf
+			    (inf, inf->stream, (fprintf_ftype) null_print,
+			     (fprintf_styled_ftype) null_styled_print);
 			  insn_size = disassemble_fn (section->vma
 						      + addr_offset, inf);
-			  inf->fprintf_func = (fprintf_ftype) objdump_sprintf;
+			  disassemble_set_printf
+			    (inf, inf->stream,
+			     (fprintf_ftype) objdump_sprintf,
+			     (fprintf_styled_ftype) objdump_styled_sprintf);
 			}
 		    }
 
@@ -3017,12 +3190,13 @@ disassemble_bytes (struct disassemble_info *inf,
 		inf->stop_vma = section->vma + stop_offset;
 
 	      inf->stop_offset = stop_offset;
+	      disassembler_in_comment = false;
 	      insn_size = (*disassemble_fn) (section->vma + addr_offset, inf);
 	      octets = insn_size;
 
 	      inf->stop_vma = 0;
-	      inf->fprintf_func = (fprintf_ftype) fprintf;
-	      inf->stream = stdout;
+	      disassemble_set_printf (inf, stdout, (fprintf_ftype) fprintf,
+				      (fprintf_styled_ftype) fprintf_styled);
 	      if (insn_width == 0 && inf->bytes_per_line != 0)
 		octets_per_line = inf->bytes_per_line;
 	      if (insn_size < (int) opb)
@@ -3583,8 +3757,9 @@ disassemble_section (bfd *abfd, asection *section, void *inf)
 	      sf.alloc = strlen (sym->name) + 40;
 	      sf.buffer = (char*) xmalloc (sf.alloc);
 	      sf.pos = 0;
-	      di.fprintf_func = (fprintf_ftype) objdump_sprintf;
-	      di.stream = &sf;
+	      disassemble_set_printf
+		(&di, &sf, (fprintf_ftype) objdump_sprintf,
+		 (fprintf_styled_ftype) objdump_styled_sprintf);
 
 	      objdump_print_symname (abfd, &di, sym);
 
@@ -3653,8 +3828,8 @@ disassemble_data (bfd *abfd)
       ++sorted_symcount;
     }
 
-  init_disassemble_info (&disasm_info, stdout, (fprintf_ftype) fprintf);
-
+  init_disassemble_info (&disasm_info, stdout, (fprintf_ftype) fprintf,
+			 (fprintf_styled_ftype) fprintf_styled);
   disasm_info.application_data = (void *) &aux;
   aux.abfd = abfd;
   aux.require_sec = false;
@@ -5495,6 +5670,16 @@ main (int argc, char **argv)
 		nonfatal (_("unrecognized argument to --visualize-option"));
 	    }
 	  break;
+	case OPTION_DISASSEMBLER_COLOR:
+	  if (streq (optarg, "off"))
+	    disassembler_color = false;
+	  else if (streq (optarg, "color"))
+	    disassembler_color = true;
+	  else if (streq (optarg, "extended-color"))
+	    disassembler_extended_color = true;
+	  else
+	    nonfatal (_("unrecognized argument to --disassembler-color"));
+	  break;
 	case 'E':
 	  if (strcmp (optarg, "B") == 0)
 	    endian = BFD_ENDIAN_BIG;
diff --git a/gdb/disasm.c b/gdb/disasm.c
index b4cde801cb0..a1ab4d71cdd 100644
--- a/gdb/disasm.c
+++ b/gdb/disasm.c
@@ -177,6 +177,22 @@ gdb_disassembler::dis_asm_fprintf (void *stream, const char *format, ...)
   return 0;
 }
 
+/* See disasm.h.  */
+
+int
+gdb_disassembler::dis_asm_styled_fprintf (void *stream,
+					  enum disassembler_style style,
+					  const char *format, ...)
+{
+  va_list args;
+
+  va_start (args, format);
+  vfprintf_filtered ((struct ui_file *) stream, format, args);
+  va_end (args);
+  /* Something non -ve.  */
+  return 0;
+}
+
 static bool
 line_is_less_than (const deprecated_dis_line_entry &mle1,
 		   const deprecated_dis_line_entry &mle2)
@@ -787,7 +803,8 @@ gdb_disassembler::gdb_disassembler (struct gdbarch *gdbarch,
 	      && file->can_emit_style_escape ()),
     m_dest (file)
 {
-  init_disassemble_info (&m_di, &m_buffer, dis_asm_fprintf);
+  init_disassemble_info (&m_di, &m_buffer, dis_asm_fprintf,
+			 dis_asm_styled_fprintf);
   m_di.flavour = bfd_target_unknown_flavour;
   m_di.memory_error_func = dis_asm_memory_error;
   m_di.print_address_func = dis_asm_print_address;
@@ -954,12 +971,25 @@ gdb_disasm_null_printf (void *stream, const char *format, ...)
   return 0;
 }
 
+/* An fprintf-function for use by the disassembler when we know we don't
+   want to print anything, and the disassembler is using style.  Always
+   returns success.  */
+
+static int ATTRIBUTE_PRINTF (3, 4)
+gdb_disasm_null_styled_printf (void *stream,
+			       enum disassembler_style style,
+			       const char *format, ...)
+{
+  return 0;
+}
+
 /* See disasm.h.  */
 
 void
 init_disassemble_info_for_no_printing (struct disassemble_info *dinfo)
 {
-  init_disassemble_info (dinfo, nullptr, gdb_disasm_null_printf);
+  init_disassemble_info (dinfo, nullptr, gdb_disasm_null_printf,
+			 gdb_disasm_null_styled_printf);
 }
 
 /* Initialize a struct disassemble_info for gdb_buffered_insn_length.
diff --git a/gdb/disasm.h b/gdb/disasm.h
index 399afc5ae71..b71cd097a16 100644
--- a/gdb/disasm.h
+++ b/gdb/disasm.h
@@ -110,6 +110,13 @@ class gdb_disassembler
   static int dis_asm_fprintf (void *stream, const char *format, ...)
     ATTRIBUTE_PRINTF(2,3);
 
+  /* Print formatted message to STREAM, the content can be styled based on
+     STYLE if desired.  */
+  static int dis_asm_styled_fprintf (void *stream,
+				     enum disassembler_style style,
+				     const char *format, ...)
+    ATTRIBUTE_PRINTF(3,4);
+
   static int dis_asm_read_memory (bfd_vma memaddr, gdb_byte *myaddr,
 				  unsigned int len,
 				  struct disassemble_info *info);
diff --git a/include/dis-asm.h b/include/dis-asm.h
index 317592448a2..4f91df12498 100644
--- a/include/dis-asm.h
+++ b/include/dis-asm.h
@@ -35,8 +35,6 @@ extern "C" {
 #include <string.h>
 #include "bfd.h"
 
-  typedef int (*fprintf_ftype) (void *, const char*, ...) ATTRIBUTE_FPTR_PRINTF_2;
-
 enum dis_insn_type
 {
   dis_noninsn,			/* Not a valid instruction.  */
@@ -49,6 +47,76 @@ enum dis_insn_type
   dis_dref2			/* Two data references in instruction.  */
 };
 
+/* When printing styled disassembler output, this describes what style
+   should be used.  */
+
+enum disassembler_style
+{
+  /* This is the default style, use this for any additional syntax
+     (e.g. commas between operands, brackets, etc), or just as a default if
+     no other style seems appropriate.  */
+  dis_style_text,
+
+  /* Use this for all instruction mnemonics, or aliases for mnemonics.
+     These should be things that correspond to real machine
+     instructions.  */
+  dis_style_mnemonic,
+
+  /* For things that aren't real machine instructions, but rather
+     assembler directives, e.g. .byte, etc.  */
+  dis_style_assembler_directive,
+
+  /* Use this for any register names.  This may or may-not include any
+     register prefix, e.g. '$', '%', at the discretion of the target,
+     though within each target the choice to include prefixes for not
+     should be kept consistent.  If the prefix is not printed with this
+     style, then dis_style_text should be used.  */
+  dis_style_register,
+
+  /* Use this for any constant values used within instructions or
+     directives, unless the value is an absolute address, or an offset
+     that will be added to an address (no matter where the address comes
+     from) before use.  This style may, or may-not be used for any
+     prefix to the immediate value, e.g. '$', at the discretion of the
+     target, though within each target the choice to include these
+     prefixes should be kept consistent.  */
+  dis_style_immediate,
+
+  /* The style for the numerical representation of an absolute address.
+     Anything that is an address offset should use the immediate style.
+     This style may, or may-not be used for any prefix to the immediate
+     value, e.g. '$', at the discretion of the target, though within
+     each target the choice to include these prefixes should be kept
+     consistent.  */
+  dis_style_address,
+
+  /* The style for any constant value within an instruction or directive
+     that represents an offset that will be added to an address before
+     use.  This style may, or may-not be used for any prefix to the
+     immediate value, e.g. '$', at the discretion of the target, though
+     within each target the choice to include these prefixes should be
+     kept consistent.  */
+  dis_style_address_offset,
+
+  /* The style for a symbol's name.  The numerical address of a symbol
+     should use the address style above, this style is reserved for the
+     name.  */
+  dis_style_symbol,
+
+  /* The start of a comment that runs to the end of the line.  Anything
+     printed after a comment start might be styled differently,
+     e.g. everything might be styled as a comment, regardless of the
+     actual style used.  The disassembler itself should not try to adjust
+     the style emitted for comment content, e.g. an address emitted within
+     a comment should still be given dis_style_address, in this way it is
+     up to the user of the disassembler to decide how comments should be
+     styled.  */
+  dis_style_comment_start
+};
+
+typedef int (*fprintf_ftype) (void *, const char*, ...) ATTRIBUTE_FPTR_PRINTF_2;
+typedef int (*fprintf_styled_ftype) (void *, enum disassembler_style, const char*, ...) ATTRIBUTE_FPTR_PRINTF_3;
+
 /* This struct is passed into the instruction decoding routine,
    and is passed back out into each callback.  The various fields are used
    for conveying information from your main routine into your callbacks,
@@ -62,6 +130,7 @@ enum dis_insn_type
 typedef struct disassemble_info
 {
   fprintf_ftype fprintf_func;
+  fprintf_styled_ftype fprintf_styled_func;
   void *stream;
   void *application_data;
 
@@ -228,6 +297,9 @@ typedef struct disassemble_info
      disassembling such as the way mapping symbols are found on AArch64.  */
   bfd_vma stop_offset;
 
+  /* Set to true if the disassembler applied styling to the output,
+     otherwise, set to false.  */
+  bool created_styled_output;
 } disassemble_info;
 
 /* This struct is used to pass information about valid disassembler
@@ -337,6 +409,10 @@ extern void disassemble_init_for_target (struct disassemble_info *);
 /* Tidy any memory allocated by targets, such as info->private_data.  */
 extern void disassemble_free_target (struct disassemble_info *);
 
+/* Set the basic disassembler print functions.  */
+extern void disassemble_set_printf (struct disassemble_info *, void *,
+				    fprintf_ftype, fprintf_styled_ftype);
+
 /* Document any target specific options available from the disassembler.  */
 extern void disassembler_usage (FILE *);
 
@@ -394,11 +470,13 @@ extern bool generic_symbol_is_valid
 /* Method to initialize a disassemble_info struct.  This should be
    called by all applications creating such a struct.  */
 extern void init_disassemble_info (struct disassemble_info *dinfo, void *stream,
-				   fprintf_ftype fprintf_func);
+				   fprintf_ftype fprintf_func,
+				   fprintf_styled_ftype fprintf_styled_func);
 
 /* For compatibility with existing code.  */
-#define INIT_DISASSEMBLE_INFO(INFO, STREAM, FPRINTF_FUNC) \
-  init_disassemble_info (&(INFO), (STREAM), (fprintf_ftype) (FPRINTF_FUNC))
+#define INIT_DISASSEMBLE_INFO(INFO, STREAM, FPRINTF_FUNC, FPRINTF_STYLED_FUNC)  \
+  init_disassemble_info (&(INFO), (STREAM), (fprintf_ftype) (FPRINTF_FUNC), \
+			 (fprintf_styled_ftype) (FPRINTF_STYLED_FUNC))
 
 #ifdef __cplusplus
 }
diff --git a/opcodes/dis-init.c b/opcodes/dis-init.c
index 0b171c0f36d..59039ef8b11 100644
--- a/opcodes/dis-init.c
+++ b/opcodes/dis-init.c
@@ -25,7 +25,8 @@
 
 void
 init_disassemble_info (struct disassemble_info *info, void *stream,
-		       fprintf_ftype fprintf_func)
+		       fprintf_ftype fprintf_func,
+		       fprintf_styled_ftype fprintf_styled_func)
 {
   memset (info, 0, sizeof (*info));
 
@@ -35,6 +36,7 @@ init_disassemble_info (struct disassemble_info *info, void *stream,
   info->endian_code = info->endian;
   info->octets_per_byte = 1;
   info->fprintf_func = fprintf_func;
+  info->fprintf_styled_func = fprintf_styled_func;
   info->stream = stream;
   info->read_memory_func = buffer_read_memory;
   info->memory_error_func = perror_memory;
@@ -42,5 +44,6 @@ init_disassemble_info (struct disassemble_info *info, void *stream,
   info->symbol_at_address_func = generic_symbol_at_address;
   info->symbol_is_valid = generic_symbol_is_valid;
   info->display_endian = BFD_ENDIAN_UNKNOWN;
+  info->created_styled_output = false;
 }
 
diff --git a/opcodes/disassemble.c b/opcodes/disassemble.c
index e613935eff8..15df49ea331 100644
--- a/opcodes/disassemble.c
+++ b/opcodes/disassemble.c
@@ -860,3 +860,16 @@ opcodes_assert (const char *file, int line)
   opcodes_error_handler (_("Please report this bug"));
   abort ();
 }
+
+/* Set the stream, and the styled and unstyled printf functions within
+   INFO.  */
+
+void
+disassemble_set_printf (struct disassemble_info *info, void *stream,
+			fprintf_ftype unstyled_printf,
+			fprintf_styled_ftype styled_printf)
+{
+  info->stream = stream;
+  info->fprintf_func = unstyled_printf;
+  info->fprintf_styled_func = styled_printf;
+}
-- 
2.25.4


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCHv2 2/3] opcodes/riscv: implement style support in the disassembler
  2022-03-21 14:33 ` [PATCHv2 " Andrew Burgess
  2022-03-21 14:33   ` [PATCHv2 1/3] objdump/opcodes: add syntax highlighting to disassembler output Andrew Burgess
@ 2022-03-21 14:33   ` Andrew Burgess
  2022-03-21 14:33   ` [PATCHv2 3/3] opcodes/i386: partially implement disassembler style support Andrew Burgess
  2022-03-24 17:08   ` [PATCHv2 0/3] disassembler syntax highlighting in objdump (via libopcodes) Nick Clifton
  3 siblings, 0 replies; 21+ messages in thread
From: Andrew Burgess @ 2022-03-21 14:33 UTC (permalink / raw)
  To: binutils; +Cc: Andrew Burgess

Update the RISC-V disassembler to supply style information.  This
allows objdump to apply syntax highlighting to the disassembler
output (when the appropriate command line flag is used).

Ignoring colours, there should be no other user visible changes in the
output of the disassembler in either objdump or gdb.

opcodes/ChangeLog:

	* disassembler.c (disassemble_init_for_target): Set
	created_styled_output for riscv.
	* riscv-dis.c: Changed throughout to use fprintf_styled_func
	instead of fprintf_func.
---
 opcodes/disassemble.c |   1 +
 opcodes/riscv-dis.c   | 193 ++++++++++++++++++++++++++----------------
 2 files changed, 122 insertions(+), 72 deletions(-)

diff --git a/opcodes/disassemble.c b/opcodes/disassemble.c
index 15df49ea331..b8eed9695ea 100644
--- a/opcodes/disassemble.c
+++ b/opcodes/disassemble.c
@@ -708,6 +708,7 @@ disassemble_init_for_target (struct disassemble_info * info)
 #ifdef ARCH_riscv
     case bfd_arch_riscv:
       info->symbol_is_valid = riscv_symbol_is_valid;
+      info->created_styled_output = true;
       break;
 #endif
 #ifdef ARCH_wasm32
diff --git a/opcodes/riscv-dis.c b/opcodes/riscv-dis.c
index 57b798d8e14..bfaefa3fb47 100644
--- a/opcodes/riscv-dis.c
+++ b/opcodes/riscv-dis.c
@@ -165,7 +165,7 @@ arg_print (struct disassemble_info *info, unsigned long val,
 	   const char* const* array, size_t size)
 {
   const char *s = val >= size || array[val] == NULL ? "unknown" : array[val];
-  (*info->fprintf_func) (info->stream, "%s", s);
+  (*info->fprintf_styled_func) (info->stream, dis_style_text, "%s", s);
 }
 
 static void
@@ -195,11 +195,11 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
   struct riscv_private_data *pd = info->private_data;
   int rs1 = (l >> OP_SH_RS1) & OP_MASK_RS1;
   int rd = (l >> OP_SH_RD) & OP_MASK_RD;
-  fprintf_ftype print = info->fprintf_func;
+  fprintf_styled_ftype print = info->fprintf_styled_func;
   const char *opargStart;
 
   if (*oparg != '\0')
-    print (info->stream, "\t");
+    print (info->stream, dis_style_text, "\t");
 
   for (; *oparg != '\0'; oparg++)
     {
@@ -211,22 +211,24 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	    {
 	    case 's': /* RS1 x8-x15.  */
 	    case 'w': /* RS1 x8-x15.  */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_gpr_names[EXTRACT_OPERAND (CRS1S, l) + 8]);
 	      break;
 	    case 't': /* RS2 x8-x15.  */
 	    case 'x': /* RS2 x8-x15.  */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_gpr_names[EXTRACT_OPERAND (CRS2S, l) + 8]);
 	      break;
 	    case 'U': /* RS1, constrained to equal RD.  */
-	      print (info->stream, "%s", riscv_gpr_names[rd]);
+	      print (info->stream, dis_style_register,
+		     "%s", riscv_gpr_names[rd]);
 	      break;
 	    case 'c': /* RS1, constrained to equal sp.  */
-	      print (info->stream, "%s", riscv_gpr_names[X_SP]);
+	      print (info->stream, dis_style_register, "%s",
+		     riscv_gpr_names[X_SP]);
 	      break;
 	    case 'V': /* RS2 */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_gpr_names[EXTRACT_OPERAND (CRS2, l)]);
 	      break;
 	    case 'o':
@@ -236,31 +238,40 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	      if (info->mach == bfd_mach_riscv64
 		  && ((l & MASK_C_ADDIW) == MATCH_C_ADDIW) && rd != 0)
 		maybe_print_address (pd, rd, EXTRACT_CITYPE_IMM (l), 1);
-	      print (info->stream, "%d", (int)EXTRACT_CITYPE_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d",
+		     (int)EXTRACT_CITYPE_IMM (l));
 	      break;
 	    case 'k':
-	      print (info->stream, "%d", (int)EXTRACT_CLTYPE_LW_IMM (l));
+	      print (info->stream, dis_style_address_offset, "%d",
+		     (int)EXTRACT_CLTYPE_LW_IMM (l));
 	      break;
 	    case 'l':
-	      print (info->stream, "%d", (int)EXTRACT_CLTYPE_LD_IMM (l));
+	      print (info->stream, dis_style_address_offset, "%d",
+		     (int)EXTRACT_CLTYPE_LD_IMM (l));
 	      break;
 	    case 'm':
-	      print (info->stream, "%d", (int)EXTRACT_CITYPE_LWSP_IMM (l));
+	      print (info->stream, dis_style_address_offset, "%d",
+		     (int)EXTRACT_CITYPE_LWSP_IMM (l));
 	      break;
 	    case 'n':
-	      print (info->stream, "%d", (int)EXTRACT_CITYPE_LDSP_IMM (l));
+	      print (info->stream, dis_style_address_offset, "%d",
+		     (int)EXTRACT_CITYPE_LDSP_IMM (l));
 	      break;
 	    case 'K':
-	      print (info->stream, "%d", (int)EXTRACT_CIWTYPE_ADDI4SPN_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d",
+		     (int)EXTRACT_CIWTYPE_ADDI4SPN_IMM (l));
 	      break;
 	    case 'L':
-	      print (info->stream, "%d", (int)EXTRACT_CITYPE_ADDI16SP_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d",
+		     (int)EXTRACT_CITYPE_ADDI16SP_IMM (l));
 	      break;
 	    case 'M':
-	      print (info->stream, "%d", (int)EXTRACT_CSSTYPE_SWSP_IMM (l));
+	      print (info->stream, dis_style_address_offset, "%d",
+		     (int)EXTRACT_CSSTYPE_SWSP_IMM (l));
 	      break;
 	    case 'N':
-	      print (info->stream, "%d", (int)EXTRACT_CSSTYPE_SDSP_IMM (l));
+	      print (info->stream, dis_style_address_offset, "%d",
+		     (int)EXTRACT_CSSTYPE_SDSP_IMM (l));
 	      break;
 	    case 'p':
 	      info->target = EXTRACT_CBTYPE_IMM (l) + pc;
@@ -271,21 +282,23 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	      (*info->print_address_func) (info->target, info);
 	      break;
 	    case 'u':
-	      print (info->stream, "0x%x",
+	      print (info->stream, dis_style_immediate, "0x%x",
 		     (int)(EXTRACT_CITYPE_IMM (l) & (RISCV_BIGIMM_REACH-1)));
 	      break;
 	    case '>':
-	      print (info->stream, "0x%x", (int)EXTRACT_CITYPE_IMM (l) & 0x3f);
+	      print (info->stream, dis_style_immediate, "0x%x",
+		     (int)EXTRACT_CITYPE_IMM (l) & 0x3f);
 	      break;
 	    case '<':
-	      print (info->stream, "0x%x", (int)EXTRACT_CITYPE_IMM (l) & 0x1f);
+	      print (info->stream, dis_style_immediate, "0x%x",
+		     (int)EXTRACT_CITYPE_IMM (l) & 0x1f);
 	      break;
 	    case 'T': /* Floating-point RS2.  */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_fpr_names[EXTRACT_OPERAND (CRS2, l)]);
 	      break;
 	    case 'D': /* Floating-point RS2 x8-x15.  */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_fpr_names[EXTRACT_OPERAND (CRS2S, l) + 8]);
 	      break;
 	    }
@@ -296,28 +309,30 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	    {
 	    case 'd':
 	    case 'f':
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_vecr_names_numeric[EXTRACT_OPERAND (VD, l)]);
 	      break;
 	    case 'e':
 	      if (!EXTRACT_OPERAND (VWD, l))
-		print (info->stream, "%s", riscv_gpr_names[0]);
+		print (info->stream, dis_style_register, "%s",
+		       riscv_gpr_names[0]);
 	      else
-		print (info->stream, "%s",
+		print (info->stream, dis_style_register, "%s",
 		       riscv_vecr_names_numeric[EXTRACT_OPERAND (VD, l)]);
 	      break;
 	    case 's':
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_vecr_names_numeric[EXTRACT_OPERAND (VS1, l)]);
 	      break;
 	    case 't':
 	    case 'u': /* VS1 == VS2 already verified at this point.  */
 	    case 'v': /* VD == VS1 == VS2 already verified at this point.  */
-	      print (info->stream, "%s",
+	      print (info->stream, dis_style_register, "%s",
 		     riscv_vecr_names_numeric[EXTRACT_OPERAND (VS2, l)]);
 	      break;
 	    case '0':
-	      print (info->stream, "%s", riscv_vecr_names_numeric[0]);
+	      print (info->stream, dis_style_register, "%s",
+		     riscv_vecr_names_numeric[0]);
 	      break;
 	    case 'b':
 	    case 'c':
@@ -337,25 +352,30 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 		    && !imm_vtype_res
 		    && riscv_vsew[imm_vsew] != NULL
 		    && riscv_vlmul[imm_vlmul] != NULL)
-		  print (info->stream, "%s,%s,%s,%s", riscv_vsew[imm_vsew],
+		  print (info->stream, dis_style_text, "%s,%s,%s,%s",
+			 riscv_vsew[imm_vsew],
 			 riscv_vlmul[imm_vlmul], riscv_vta[imm_vta],
 			 riscv_vma[imm_vma]);
 		else
-		  print (info->stream, "%d", imm);
+		  print (info->stream, dis_style_immediate, "%d", imm);
 	      }
 	      break;
 	    case 'i':
-	      print (info->stream, "%d", (int)EXTRACT_RVV_VI_IMM (l));
+	      print (info->stream, dis_style_immediate, "%d",
+		     (int)EXTRACT_RVV_VI_IMM (l));
 	      break;
 	    case 'j':
-	      print (info->stream, "%d", (int)EXTRACT_RVV_VI_UIMM (l));
+	      print (info->stream, dis_style_immediate, "%d",
+		     (int)EXTRACT_RVV_VI_UIMM (l));
 	      break;
 	    case 'k':
-	      print (info->stream, "%d", (int)EXTRACT_RVV_OFFSET (l));
+	      print (info->stream, dis_style_immediate, "%d",
+		     (int)EXTRACT_RVV_OFFSET (l));
 	      break;
 	    case 'm':
 	      if (! EXTRACT_OPERAND (VMASK, l))
-		print (info->stream, ",%s", riscv_vecm_names_numeric[0]);
+		print (info->stream, dis_style_register, ",%s",
+		       riscv_vecm_names_numeric[0]);
 	      break;
 	    }
 	  break;
@@ -365,29 +385,29 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	case ')':
 	case '[':
 	case ']':
-	  print (info->stream, "%c", *oparg);
+	  print (info->stream, dis_style_text, "%c", *oparg);
 	  break;
 
 	case '0':
 	  /* Only print constant 0 if it is the last argument.  */
 	  if (!oparg[1])
-	    print (info->stream, "0");
+	    print (info->stream, dis_style_immediate, "0");
 	  break;
 
 	case 'b':
 	case 's':
 	  if ((l & MASK_JALR) == MATCH_JALR)
 	    maybe_print_address (pd, rs1, 0, 0);
-	  print (info->stream, "%s", riscv_gpr_names[rs1]);
+	  print (info->stream, dis_style_register, "%s", riscv_gpr_names[rs1]);
 	  break;
 
 	case 't':
-	  print (info->stream, "%s",
+	  print (info->stream, dis_style_register, "%s",
 		 riscv_gpr_names[EXTRACT_OPERAND (RS2, l)]);
 	  break;
 
 	case 'u':
-	  print (info->stream, "0x%x",
+	  print (info->stream, dis_style_immediate, "0x%x",
 		 (unsigned)EXTRACT_UTYPE_IMM (l) >> RISCV_IMM_BITS);
 	  break;
 
@@ -416,16 +436,19 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	  if (info->mach == bfd_mach_riscv64
 	      && ((l & MASK_ADDIW) == MATCH_ADDIW) && rs1 != 0)
 	    maybe_print_address (pd, rs1, EXTRACT_ITYPE_IMM (l), 1);
-	  print (info->stream, "%d", (int)EXTRACT_ITYPE_IMM (l));
+	  print (info->stream, dis_style_immediate, "%d",
+		 (int)EXTRACT_ITYPE_IMM (l));
 	  break;
 
 	case 'q':
 	  maybe_print_address (pd, rs1, EXTRACT_STYPE_IMM (l), 0);
-	  print (info->stream, "%d", (int)EXTRACT_STYPE_IMM (l));
+	  print (info->stream, dis_style_address_offset, "%d",
+		 (int)EXTRACT_STYPE_IMM (l));
 	  break;
 
 	case 'f':
-	  print (info->stream, "%d", (int)EXTRACT_STYPE_IMM (l));
+	  print (info->stream, dis_style_address_offset, "%d",
+		 (int)EXTRACT_STYPE_IMM (l));
 	  break;
 
 	case 'a':
@@ -445,40 +468,45 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	    pd->hi_addr[rd] = EXTRACT_UTYPE_IMM (l);
 	  else if ((l & MASK_C_LUI) == MATCH_C_LUI)
 	    pd->hi_addr[rd] = EXTRACT_CITYPE_LUI_IMM (l);
-	  print (info->stream, "%s", riscv_gpr_names[rd]);
+	  print (info->stream, dis_style_register, "%s", riscv_gpr_names[rd]);
 	  break;
 
 	case 'y':
-	  print (info->stream, "0x%x", (int)EXTRACT_OPERAND (BS, l));
+	  print (info->stream, dis_style_text, "0x%x",
+		 (int)EXTRACT_OPERAND (BS, l));
 	  break;
 
 	case 'z':
-	  print (info->stream, "%s", riscv_gpr_names[0]);
+	  print (info->stream, dis_style_register, "%s", riscv_gpr_names[0]);
 	  break;
 
 	case '>':
-	  print (info->stream, "0x%x", (int)EXTRACT_OPERAND (SHAMT, l));
+	  print (info->stream, dis_style_immediate, "0x%x",
+		 (int)EXTRACT_OPERAND (SHAMT, l));
 	  break;
 
 	case '<':
-	  print (info->stream, "0x%x", (int)EXTRACT_OPERAND (SHAMTW, l));
+	  print (info->stream, dis_style_immediate, "0x%x",
+		 (int)EXTRACT_OPERAND (SHAMTW, l));
 	  break;
 
 	case 'S':
 	case 'U':
-	  print (info->stream, "%s", riscv_fpr_names[rs1]);
+	  print (info->stream, dis_style_register, "%s", riscv_fpr_names[rs1]);
 	  break;
 
 	case 'T':
-	  print (info->stream, "%s", riscv_fpr_names[EXTRACT_OPERAND (RS2, l)]);
+	  print (info->stream, dis_style_register, "%s",
+		 riscv_fpr_names[EXTRACT_OPERAND (RS2, l)]);
 	  break;
 
 	case 'D':
-	  print (info->stream, "%s", riscv_fpr_names[rd]);
+	  print (info->stream, dis_style_register, "%s", riscv_fpr_names[rd]);
 	  break;
 
 	case 'R':
-	  print (info->stream, "%s", riscv_fpr_names[EXTRACT_OPERAND (RS3, l)]);
+	  print (info->stream, dis_style_register, "%s",
+		 riscv_fpr_names[EXTRACT_OPERAND (RS3, l)]);
 	  break;
 
 	case 'E':
@@ -511,23 +539,25 @@ print_insn_args (const char *oparg, insn_t l, bfd_vma pc, disassemble_info *info
 	      }
 
 	    if (riscv_csr_hash[csr] != NULL)
-	      print (info->stream, "%s", riscv_csr_hash[csr]);
+	      print (info->stream, dis_style_text, "%s", riscv_csr_hash[csr]);
 	    else
-	      print (info->stream, "0x%x", csr);
+	      print (info->stream, dis_style_text, "0x%x", csr);
 	    break;
 	  }
 
 	case 'Y':
-	  print (info->stream, "0x%x", (int)EXTRACT_OPERAND (RNUM, l));
+	  print (info->stream, dis_style_text, "0x%x",
+		 (int) EXTRACT_OPERAND (RNUM, l));
 	  break;
 
 	case 'Z':
-	  print (info->stream, "%d", rs1);
+	  print (info->stream, dis_style_text, "%d", rs1);
 	  break;
 
 	default:
 	  /* xgettext:c-format */
-	  print (info->stream, _("# internal error, undefined modifier (%c)"),
+	  print (info->stream, dis_style_text,
+		 _("# internal error, undefined modifier (%c)"),
 		 *opargStart);
 	  return;
 	}
@@ -627,14 +657,16 @@ riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
 	    continue;
 
 	  /* It's a match.  */
-	  (*info->fprintf_func) (info->stream, "%s", op->name);
+	  (*info->fprintf_styled_func) (info->stream, dis_style_mnemonic,
+					"%s", op->name);
 	  print_insn_args (op->args, word, memaddr, info);
 
 	  /* Try to disassemble multi-instruction addressing sequences.  */
 	  if (pd->print_addr != (bfd_vma)-1)
 	    {
 	      info->target = pd->print_addr;
-	      (*info->fprintf_func) (info->stream, " # ");
+	      (*info->fprintf_styled_func)
+		(info->stream, dis_style_comment_start, " # ");
 	      (*info->print_address_func) (info->target, info);
 	      pd->print_addr = -1;
 	    }
@@ -676,19 +708,24 @@ riscv_disassemble_insn (bfd_vma memaddr, insn_t word, disassemble_info *info)
     case 2:
     case 4:
     case 8:
-      (*info->fprintf_func) (info->stream, ".%dbyte\t0x%llx",
-                             insnlen, (unsigned long long) word);
+      (*info->fprintf_styled_func)
+	(info->stream, dis_style_assembler_directive, ".%dbyte\t", insnlen);
+      (*info->fprintf_styled_func) (info->stream, dis_style_immediate,
+				    "0x%llx", (unsigned long long) word);
       break;
     default:
       {
         int i;
-        (*info->fprintf_func) (info->stream, ".byte\t");
+	(*info->fprintf_styled_func)
+	  (info->stream, dis_style_assembler_directive, ".byte\t");
         for (i = 0; i < insnlen; ++i)
           {
             if (i > 0)
-              (*info->fprintf_func) (info->stream, ", ");
-            (*info->fprintf_func) (info->stream, "0x%02x",
-                                   (unsigned int) (word & 0xff));
+	      (*info->fprintf_styled_func) (info->stream, dis_style_text,
+					    ", ");
+	    (*info->fprintf_styled_func) (info->stream, dis_style_immediate,
+					  "0x%02x",
+					  (unsigned int) (word & 0xff));
             word >>= 8;
           }
       }
@@ -867,23 +904,35 @@ riscv_disassemble_data (bfd_vma memaddr ATTRIBUTE_UNUSED,
     {
     case 1:
       info->bytes_per_line = 6;
-      (*info->fprintf_func) (info->stream, ".byte\t0x%02llx",
-			     (unsigned long long) data);
+      (*info->fprintf_styled_func)
+	(info->stream, dis_style_assembler_directive, ".byte\t");
+      (*info->fprintf_styled_func)
+	(info->stream, dis_style_assembler_directive, "0x%02llx",
+	 (unsigned long long) data);
       break;
     case 2:
       info->bytes_per_line = 8;
-      (*info->fprintf_func) (info->stream, ".short\t0x%04llx",
-			     (unsigned long long) data);
+      (*info->fprintf_styled_func)
+	(info->stream, dis_style_assembler_directive, ".short\t");
+      (*info->fprintf_styled_func)
+	(info->stream, dis_style_immediate, "0x%04llx",
+	 (unsigned long long) data);
       break;
     case 4:
       info->bytes_per_line = 8;
-      (*info->fprintf_func) (info->stream, ".word\t0x%08llx",
-			     (unsigned long long) data);
+      (*info->fprintf_styled_func)
+	(info->stream, dis_style_assembler_directive, ".word\t");
+      (*info->fprintf_styled_func)
+	(info->stream, dis_style_immediate, "0x%08llx",
+	 (unsigned long long) data);
       break;
     case 8:
       info->bytes_per_line = 8;
-      (*info->fprintf_func) (info->stream, ".dword\t0x%016llx",
-			     (unsigned long long) data);
+      (*info->fprintf_styled_func)
+	(info->stream, dis_style_assembler_directive, ".dword\t");
+      (*info->fprintf_styled_func)
+	(info->stream, dis_style_immediate, "0x%016llx",
+	 (unsigned long long) data);
       break;
     default:
       abort ();
-- 
2.25.4


^ permalink raw reply	[flat|nested] 21+ messages in thread

* [PATCHv2 3/3] opcodes/i386: partially implement disassembler style support
  2022-03-21 14:33 ` [PATCHv2 " Andrew Burgess
  2022-03-21 14:33   ` [PATCHv2 1/3] objdump/opcodes: add syntax highlighting to disassembler output Andrew Burgess
  2022-03-21 14:33   ` [PATCHv2 2/3] opcodes/riscv: implement style support in the disassembler Andrew Burgess
@ 2022-03-21 14:33   ` Andrew Burgess
  2022-03-24 17:08   ` [PATCHv2 0/3] disassembler syntax highlighting in objdump (via libopcodes) Nick Clifton
  3 siblings, 0 replies; 21+ messages in thread
From: Andrew Burgess @ 2022-03-21 14:33 UTC (permalink / raw)
  To: binutils; +Cc: Andrew Burgess

This commit adds partial support for disassembler styling in the i386
disassembler.

The i386 disassembler collects the instruction arguments into an array
of strings, and then loops over the array printing the arguments out
later on.  The problem is that by the time we print the arguments out
it's not obvious what the type of each argument is.

Obviously this can be fixed, but I'd like to not do that as part of
this commit, rather, I'd prefer to keep this commit as small as
possible to get the basic infrastructure in place, then we can improve
on this, to add additional styling, in later commits.

For now then, I think this commit should correctly style mnemonics,
some immediates, and comments.  Everything else will be printed as
plain text, which will include most instruction arguments, unless the
argument is printed as a symbol, by calling the print_address_func
callback.

Ignoring colours, there should be no other user visible changes in the
output of the disassembler in either objdump or gdb.

opcodes/ChangeLog:

	* disassembler.c (disassemble_init_for_target): Set
	created_styled_output for i386 based targets.
	* i386-dis.c: Changed throughout to use fprintf_styled_func
	instead of fprintf_func.
---
 opcodes/disassemble.c |  9 ++++++-
 opcodes/i386-dis.c    | 63 +++++++++++++++++++++++++++----------------
 2 files changed, 48 insertions(+), 24 deletions(-)

diff --git a/opcodes/disassemble.c b/opcodes/disassemble.c
index b8eed9695ea..81a90f20a9c 100644
--- a/opcodes/disassemble.c
+++ b/opcodes/disassemble.c
@@ -632,7 +632,14 @@ disassemble_init_for_target (struct disassemble_info * info)
       info->disassembler_needs_relocs = true;
       break;
 #endif
-
+#ifdef ARCH_i386
+    case bfd_arch_i386:
+    case bfd_arch_iamcu:
+    case bfd_arch_l1om:
+    case bfd_arch_k1om:
+      info->created_styled_output = true;
+      break;
+#endif
 #ifdef ARCH_ia64
     case bfd_arch_ia64:
       info->skip_zeroes = 16;
diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index a30bda0633b..d12ef4514fd 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -9402,8 +9402,8 @@ print_insn (bfd_vma pc, instr_info *ins)
 
   if (ins->address_mode == mode_64bit && sizeof (bfd_vma) < 8)
     {
-      (*ins->info->fprintf_func) (ins->info->stream,
-			     _("64-bit address is disabled"));
+      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
+					 _("64-bit address is disabled"));
       return -1;
     }
 
@@ -9456,12 +9456,16 @@ print_insn (bfd_vma pc, instr_info *ins)
 	{
 	  name = prefix_name (ins, priv.the_buffer[0], priv.orig_sizeflag);
 	  if (name != NULL)
-	    (*ins->info->fprintf_func) (ins->info->stream, "%s", name);
+	    (*ins->info->fprintf_styled_func)
+	      (ins->info->stream, dis_style_mnemonic, "%s", name);
 	  else
 	    {
 	      /* Just print the first byte as a .byte instruction.  */
-	      (*ins->info->fprintf_func) (ins->info->stream, ".byte 0x%x",
-				     (unsigned int) priv.the_buffer[0]);
+	      (*ins->info->fprintf_styled_func)
+		(ins->info->stream, dis_style_assembler_directive, ".byte ");
+	      (*ins->info->fprintf_styled_func)
+		(ins->info->stream, dis_style_immediate, "0x%x",
+		 (unsigned int) priv.the_buffer[0]);
 	    }
 
 	  return 1;
@@ -9479,10 +9483,10 @@ print_insn (bfd_vma pc, instr_info *ins)
       for (i = 0;
 	   i < (int) ARRAY_SIZE (ins->all_prefixes) && ins->all_prefixes[i];
 	   i++)
-	(*ins->info->fprintf_func) (ins->info->stream, "%s%s",
-			       i == 0 ? "" : " ",
-			       prefix_name (ins, ins->all_prefixes[i],
-					    sizeflag));
+	(*ins->info->fprintf_styled_func)
+	  (ins->info->stream, dis_style_mnemonic, "%s%s",
+	   (i == 0 ? "" : " "), prefix_name (ins, ins->all_prefixes[i],
+					     sizeflag));
       return i;
     }
 
@@ -9497,10 +9501,11 @@ print_insn (bfd_vma pc, instr_info *ins)
       /* Handle ins->prefixes before fwait.  */
       for (i = 0; i < ins->fwait_prefix && ins->all_prefixes[i];
 	   i++)
-	(*ins->info->fprintf_func) (ins->info->stream, "%s ",
-				    prefix_name (ins, ins->all_prefixes[i],
-						 sizeflag));
-      (*ins->info->fprintf_func) (ins->info->stream, "fwait");
+	(*ins->info->fprintf_styled_func)
+	  (ins->info->stream, dis_style_mnemonic, "%s ",
+	   prefix_name (ins, ins->all_prefixes[i], sizeflag));
+      (*ins->info->fprintf_styled_func)
+	(ins->info->stream, dis_style_mnemonic, "fwait");
       return i + 1;
     }
 
@@ -9649,14 +9654,16 @@ print_insn (bfd_vma pc, instr_info *ins)
      are all 0s in inverted form.  */
   if (ins->need_vex && ins->vex.register_specifier != 0)
     {
-      (*ins->info->fprintf_func) (ins->info->stream, "(bad)");
+      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
+					 "(bad)");
       return ins->end_codep - priv.the_buffer;
     }
 
   /* If EVEX.z is set, there must be an actual mask register in use.  */
   if (ins->vex.zeroing && ins->vex.mask_register_specifier == 0)
     {
-      (*ins->info->fprintf_func) (ins->info->stream, "(bad)");
+      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
+					 "(bad)");
       return ins->end_codep - priv.the_buffer;
     }
 
@@ -9667,7 +9674,8 @@ print_insn (bfd_vma pc, instr_info *ins)
 	 the encoding invalid.  Most other PREFIX_OPCODE rules still apply.  */
       if (ins->need_vex ? !ins->vex.prefix : !(ins->prefixes & PREFIX_DATA))
 	{
-	  (*ins->info->fprintf_func) (ins->info->stream, "(bad)");
+	  (*ins->info->fprintf_styled_func) (ins->info->stream,
+					     dis_style_text, "(bad)");
 	  return ins->end_codep - priv.the_buffer;
 	}
       ins->used_prefixes |= PREFIX_DATA;
@@ -9694,7 +9702,8 @@ print_insn (bfd_vma pc, instr_info *ins)
 	  || (ins->vex.evex && dp->prefix_requirement != PREFIX_DATA
 	      && !ins->vex.w != !(ins->used_prefixes & PREFIX_DATA)))
 	{
-	  (*ins->info->fprintf_func) (ins->info->stream, "(bad)");
+	  (*ins->info->fprintf_styled_func) (ins->info->stream,
+					     dis_style_text, "(bad)");
 	  return ins->end_codep - priv.the_buffer;
 	}
       break;
@@ -9744,13 +9753,15 @@ print_insn (bfd_vma pc, instr_info *ins)
 	if (name == NULL)
 	  abort ();
 	prefix_length += strlen (name) + 1;
-	(*ins->info->fprintf_func) (ins->info->stream, "%s ", name);
+	(*ins->info->fprintf_styled_func)
+	  (ins->info->stream, dis_style_mnemonic, "%s ", name);
       }
 
   /* Check maximum code length.  */
   if ((ins->codep - ins->start_codep) > MAX_CODE_LENGTH)
     {
-      (*ins->info->fprintf_func) (ins->info->stream, "(bad)");
+      (*ins->info->fprintf_styled_func)
+	(ins->info->stream, dis_style_text, "(bad)");
       return MAX_CODE_LENGTH;
     }
 
@@ -9758,7 +9769,8 @@ print_insn (bfd_vma pc, instr_info *ins)
   for (i = strlen (ins->obuf) + prefix_length; i < 6; i++)
     oappend (ins, " ");
   oappend (ins, " ");
-  (*ins->info->fprintf_func) (ins->info->stream, "%s", ins->obuf);
+  (*ins->info->fprintf_styled_func)
+    (ins->info->stream, dis_style_mnemonic, "%s", ins->obuf);
 
   /* The enter and bound instructions are printed with operands in the same
      order as the intel book; everything else is printed in reverse order.  */
@@ -9797,7 +9809,8 @@ print_insn (bfd_vma pc, instr_info *ins)
     if (*op_txt[i])
       {
 	if (needcomma)
-	  (*ins->info->fprintf_func) (ins->info->stream, ",");
+	  (*ins->info->fprintf_styled_func) (ins->info->stream,
+					     dis_style_text, ",");
 	if (ins->op_index[i] != -1 && !ins->op_riprel[i])
 	  {
 	    bfd_vma target = (bfd_vma) ins->op_address[ins->op_index[i]];
@@ -9813,14 +9826,18 @@ print_insn (bfd_vma pc, instr_info *ins)
 	    (*ins->info->print_address_func) (target, ins->info);
 	  }
 	else
-	  (*ins->info->fprintf_func) (ins->info->stream, "%s", op_txt[i]);
+	  (*ins->info->fprintf_styled_func) (ins->info->stream,
+					     dis_style_text, "%s",
+					     op_txt[i]);
 	needcomma = 1;
       }
 
   for (i = 0; i < MAX_OPERANDS; i++)
     if (ins->op_index[i] != -1 && ins->op_riprel[i])
       {
-	(*ins->info->fprintf_func) (ins->info->stream, "        # ");
+	(*ins->info->fprintf_styled_func) (ins->info->stream,
+					   dis_style_comment_start,
+					   "        # ");
 	(*ins->info->print_address_func) ((bfd_vma)
 			(ins->start_pc + (ins->codep - ins->start_codep)
 			 + ins->op_address[ins->op_index[i]]), ins->info);
-- 
2.25.4


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: [PATCHv2 0/3] disassembler syntax highlighting in objdump (via libopcodes)
  2022-03-21 14:33 ` [PATCHv2 " Andrew Burgess
                     ` (2 preceding siblings ...)
  2022-03-21 14:33   ` [PATCHv2 3/3] opcodes/i386: partially implement disassembler style support Andrew Burgess
@ 2022-03-24 17:08   ` Nick Clifton
  3 siblings, 0 replies; 21+ messages in thread
From: Nick Clifton @ 2022-03-24 17:08 UTC (permalink / raw)
  To: Andrew Burgess, binutils

Hi Andrew,

> This series changes libopcodes so that this disassemblers can supply
> styling information with every piece of disassembly output, e.g. is
> this a register?  an address?  a mnemonic?  etc.

I like it!

>    objdump/opcodes: add syntax highlighting to disassembler output
>    opcodes/riscv: implement style support in the disassembler
>    opcodes/i386: partially implement disassembler style support
> 
>   binutils/NEWS              |   4 +
>   binutils/doc/binutils.texi |  11 ++
>   binutils/objdump.c         | 249 ++++++++++++++++++++++++++++++++-----
>   gdb/disasm.c               |  34 ++++-
>   gdb/disasm.h               |   7 ++
>   include/dis-asm.h          |  88 ++++++++++++-
>   opcodes/dis-init.c         |   5 +-
>   opcodes/disassemble.c      |  23 +++-
>   opcodes/i386-dis.c         |  63 ++++++----
>   opcodes/riscv-dis.c        | 193 +++++++++++++++++-----------
>   10 files changed, 541 insertions(+), 136 deletions(-)
> 

Patch series approved - please apply them all.

Cheers
   Nick


^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2022-03-24 17:08 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-02-16 20:53 [PATCH 0/3] disassembler syntax highlighting in objdump (via libopcodes) Andrew Burgess
2022-02-16 20:53 ` [PATCH 1/3] objdump/opcodes: add syntax highlighting to disassembler output Andrew Burgess
2022-02-28 15:54   ` Tom Tromey
2022-02-16 20:53 ` [PATCH 2/3] opcodes/riscv: implement style support in the disassembler Andrew Burgess
2022-02-19 10:24   ` Andrew Burgess
2022-02-16 20:53 ` [PATCH 3/3] opcodes/i386: partially implement disassembler style support Andrew Burgess
2022-02-17  9:35   ` Jan Beulich
2022-02-17 16:15     ` Andrew Burgess
2022-02-17 16:29       ` Jan Beulich
2022-02-17 22:37         ` Andrew Burgess
2022-02-18  7:14           ` Jan Beulich
2022-02-19 10:54             ` Andrew Burgess
2022-02-21 13:08               ` Jan Beulich
2022-02-21 18:01                 ` Andrew Burgess
2022-02-17  3:57 ` [PATCH 0/3] disassembler syntax highlighting in objdump (via libopcodes) Nelson Chu
2022-02-17 16:17   ` Andrew Burgess
2022-03-21 14:33 ` [PATCHv2 " Andrew Burgess
2022-03-21 14:33   ` [PATCHv2 1/3] objdump/opcodes: add syntax highlighting to disassembler output Andrew Burgess
2022-03-21 14:33   ` [PATCHv2 2/3] opcodes/riscv: implement style support in the disassembler Andrew Burgess
2022-03-21 14:33   ` [PATCHv2 3/3] opcodes/i386: partially implement disassembler style support Andrew Burgess
2022-03-24 17:08   ` [PATCHv2 0/3] disassembler syntax highlighting in objdump (via libopcodes) Nick Clifton

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).