public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
* [PATCH 0/2] Disassembler styling for i386-dis.c
@ 2022-04-29 13:42 Andrew Burgess
  2022-04-29 13:42 ` [PATCH 1/2] objdump: fix styled printing of addresses Andrew Burgess
  2022-04-29 13:42 ` [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler Andrew Burgess
  0 siblings, 2 replies; 29+ messages in thread
From: Andrew Burgess @ 2022-04-29 13:42 UTC (permalink / raw)
  To: binutils; +Cc: Andrew Burgess

This series builds on commit:

  commit 60a3da00bd5407f07d64dff82a4dae98230dfaac
  Date:   Sat Jan 22 11:38:18 2022 +0000
  
      objdump/opcodes: add syntax highlighting to disassembler output

which introduced a framework for disassembler styling.

In this series I extend the minimal styling that currently exists in
the i386 disassembler to add full styling for all instruction
operands.

The i386 disassembler is pretty complex, so it is quite possible that
I've missed some corners of it, however, this should not cause any
major problems, worst case some output would (when styling is on) end
up with the wrong style, or no style at all.

That said, in the testing I've done, I'm not seeing anything that's
not styled any more, and the styling I do see looks reasonable -
though I don't claim to have manually checked every single i386
instruction.

If anyone spots any output that is not styling correctly, then please
just let me know, and I'm happy to get it sorted.

Thanks,
Andrew

---

Andrew Burgess (2):
  objdump: fix styled printing of addresses
  libopcodes: extend the styling within the i386 disassembler

 binutils/objdump.c |   9 +-
 opcodes/i386-dis.c | 571 ++++++++++++++++++++++++++-------------------
 2 files changed, 337 insertions(+), 243 deletions(-)

-- 
2.25.4


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 1/2] objdump: fix styled printing of addresses
  2022-04-29 13:42 [PATCH 0/2] Disassembler styling for i386-dis.c Andrew Burgess
@ 2022-04-29 13:42 ` Andrew Burgess
  2022-05-02  7:14   ` Jan Beulich
  2022-04-29 13:42 ` [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler Andrew Burgess
  1 sibling, 1 reply; 29+ messages in thread
From: Andrew Burgess @ 2022-04-29 13:42 UTC (permalink / raw)
  To: binutils; +Cc: Andrew Burgess

Previous work to add styled disassembler output missed a case in
objdump_print_addr, which is fixed in this commit.
---
 binutils/objdump.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

diff --git a/binutils/objdump.c b/binutils/objdump.c
index 54c89a32db2..060a136efa4 100644
--- a/binutils/objdump.c
+++ b/binutils/objdump.c
@@ -1640,14 +1640,15 @@ objdump_print_addr (bfd_vma vma,
     {
       if (!no_addresses)
 	{
-	  (*inf->fprintf_func) (inf->stream, "0x");
+	  (*inf->fprintf_styled_func) (inf->stream, dis_style_address, "0x");
 	  objdump_print_value (vma, inf, skip_zeroes);
 	}
 
       if (display_file_offsets)
-	inf->fprintf_func (inf->stream, _(" (File Offset: 0x%lx)"),
-			   (long int) (inf->section->filepos
-				       + (vma - inf->section->vma)));
+	inf->fprintf_styled_func (inf->stream, dis_style_text,
+				  _(" (File Offset: 0x%lx)"),
+				  (long int) (inf->section->filepos
+					      + (vma - inf->section->vma)));
       return;
     }
 
-- 
2.25.4


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
  2022-04-29 13:42 [PATCH 0/2] Disassembler styling for i386-dis.c Andrew Burgess
  2022-04-29 13:42 ` [PATCH 1/2] objdump: fix styled printing of addresses Andrew Burgess
@ 2022-04-29 13:42 ` Andrew Burgess
  2022-04-29 18:16   ` Vladimir Mezentsev
                     ` (2 more replies)
  1 sibling, 3 replies; 29+ messages in thread
From: Andrew Burgess @ 2022-04-29 13:42 UTC (permalink / raw)
  To: binutils; +Cc: Andrew Burgess

The i386 disassembler is pretty complex.  Most disassembly is done
indirectly; operands are built into buffers within a struct instr_info
instance, before finally being printed later in the disassembly
process.

Sometimes the operand buffers are built in a different order to the
order in which they will eventually be printed.

Each operand can contain multiple components, e.g. multiple registers,
immediates, other textual elements (commas, brackets, etc).

When looking for how to apply styling I guess the ideal solution would
be to move away from the operands being a single string that is built
up, and instead have each operand be a list of "parts", where each
part is some text and a style.  Then, when we eventually print the
operand we would loop over the parts and print each part with the
correct style.

But it feels like a huge amount of work to move from where we are
now to that potentially ideal solution.  Plus, the above solution
would be pretty complex.

So, instead I propose a .... different solution here, one that works
with the existing infrastructure.

As each operand is built up, piece be piece, we pass through style
information.  This style information is then encoded into the operand
buffer (see below for details).  After this the code can continue to
operate as it does right now in order to manage the set of operand
buffers.

Then, as each operand is printed we can split the operand buffer into
chunks at the style marker boundaries, with each chunk being printed
in the correct style.

For encoding the style information I use the format "~%x~".  As far as
I can tell the '~' is not otherwise used in the i386 disassembler, so
this should serve as a unique marker.  To speed up writing and then
reading the style markers, I take advantage of the fact that there are
less than 16 styles so I know the '%x' will only ever be a single hex
character.

In some (not very scientific) benchmarking on my machine,
disassembling a reasonably large (142M) shared library, I'm not seeing
any significant slow down in disassembler speed with this change.

Most instructions are now being fully syntax highlighted when I
disassemble using the --disassembler-color=extended-color option.  I'm
sure that there are probably still a few corner cases that need fixing
up, but we can come back to them later I think.

When disassembler syntax highlighting is not being used, then there
should be no user visible changes after this commit.
---
 opcodes/i386-dis.c | 571 ++++++++++++++++++++++++++-------------------
 1 file changed, 332 insertions(+), 239 deletions(-)

diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 1e3266329c1..c94d316a03f 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -42,12 +42,14 @@
 #include <setjmp.h>
 typedef struct instr_info instr_info;
 
+#define STYLE_BUFFER_SIZE 10
+
 static int print_insn (bfd_vma, instr_info *);
 static void dofloat (instr_info *, int);
 static void OP_ST (instr_info *, int, int);
 static void OP_STi (instr_info *, int, int);
 static int putop (instr_info *, const char *, int);
-static void oappend (instr_info *, const char *);
+static void oappend (instr_info *, const char *, enum disassembler_style);
 static void append_seg (instr_info *);
 static void OP_indirE (instr_info *, int, int);
 static void print_operand_value (instr_info *, char *, int, bfd_vma);
@@ -166,6 +168,8 @@ struct instr_info
   char *obufp;
   char *mnemonicendp;
   char scratchbuf[100];
+  char style_buffer[STYLE_BUFFER_SIZE];
+  char staging_area[100];
   unsigned char *start_codep;
   unsigned char *insn_codep;
   unsigned char *codep;
@@ -248,6 +252,8 @@ struct instr_info
 
   enum x86_64_isa isa64;
 
+  int (*printf) (instr_info *ins, enum disassembler_style style,
+		 const char *fmt, ...) ATTRIBUTE_FPTR_PRINTF_3;
 };
 
 /* Mark parts used in the REX prefix.  When we are testing for
@@ -9300,9 +9306,73 @@ get_sib (instr_info *ins, int sizeflag)
 /* Like oappend (below), but S is a string starting with '%'.
    In Intel syntax, the '%' is elided.  */
 static void
-oappend_maybe_intel (instr_info *ins, const char *s)
+oappend_maybe_intel (instr_info *ins, const char *s,
+		     enum disassembler_style style)
 {
-  oappend (ins, s + ins->intel_syntax);
+  oappend (ins, s + ins->intel_syntax, style);
+}
+
+/* Wrap around a call to INS->info->fprintf_styled_func, printing FMT.
+   STYLE is the default style to use in the fprintf_styled_func calls,
+   however, FMT might include embedded style markers (see oappend_style),
+   these embedded markers are not printed, but instead change the style
+   used in the next fprintf_styled_func call.
+
+   Return non-zero to indicate the print call was a success.  */
+
+static int ATTRIBUTE_PRINTF_3
+i386_dis_printf (instr_info *ins, enum disassembler_style style,
+		 const char *fmt, ...)
+{
+  va_list ap;
+  enum disassembler_style curr_style = style;
+  char *start, *curr;
+
+  va_start (ap, fmt);
+  vsnprintf (ins->staging_area, 100, fmt, ap);
+  va_end (ap);
+
+  start = curr = ins->staging_area;
+
+  do
+    {
+      if (*curr == '\0' || *curr == '~')
+	{
+	  /* Output content between our START position and CURR.  */
+	  int len = curr - start;
+	  (*ins->info->fprintf_styled_func) (ins->info->stream, curr_style,
+					     "%.*s", len, start);
+	  if (*curr == '\0')
+	    break;
+
+	  /* Update the CURR_STYLE, it is possible here that if the input
+	     is corrupted in some way, then we may set CURR_STYLE to an
+	     invalid value.  Don't worry though, we check for that in a
+	     subsequent if block.  */
+	  ++curr;
+	  if (*curr >= '0' && *curr <= '9')
+	    curr_style = (enum disassembler_style) (*curr - '0');
+	  else if (*curr >= 'a' && *curr <= 'f')
+	    curr_style = (enum disassembler_style) (*curr - 'a' + 10);
+	  else
+	    curr_style = dis_style_text;
+
+	  /* Skip over the hex character, and the closing '~'.  Also
+	     validate that CURR_STYLE is set to a valid value.  */
+	  ++curr;
+	  if (*curr != '~' || curr_style > dis_style_comment_start)
+	    curr_style = dis_style_text;
+	  ++curr;
+
+	  /* Reset the START and CURR pointers to after the style marker.  */
+	  start = curr;
+	}
+      else
+	++curr;
+    }
+  while (true);
+
+  return 1;
 }
 
 static int
@@ -9317,6 +9387,7 @@ print_insn (bfd_vma pc, instr_info *ins)
   struct dis_private priv;
   int prefix_length;
 
+  ins->printf = i386_dis_printf;
   ins->isa64 = 0;
   ins->intel_mnemonic = !SYSV386_COMPAT;
   ins->op_is_jump = false;
@@ -9401,8 +9472,7 @@ print_insn (bfd_vma pc, instr_info *ins)
 
   if (ins->address_mode == mode_64bit && sizeof (bfd_vma) < 8)
     {
-      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
-					 _("64-bit address is disabled"));
+      ins->printf (ins, dis_style_text, _("64-bit address is disabled"));
       return -1;
     }
 
@@ -9451,16 +9521,14 @@ print_insn (bfd_vma pc, instr_info *ins)
 	{
 	  name = prefix_name (ins, priv.the_buffer[0], priv.orig_sizeflag);
 	  if (name != NULL)
-	    (*ins->info->fprintf_styled_func)
-	      (ins->info->stream, dis_style_mnemonic, "%s", name);
+	    ins->printf (ins, dis_style_mnemonic, "%s", name);
 	  else
 	    {
 	      /* Just print the first byte as a .byte instruction.  */
-	      (*ins->info->fprintf_styled_func)
-		(ins->info->stream, dis_style_assembler_directive, ".byte ");
-	      (*ins->info->fprintf_styled_func)
-		(ins->info->stream, dis_style_immediate, "0x%x",
-		 (unsigned int) priv.the_buffer[0]);
+	      ins->printf (ins, dis_style_assembler_directive,
+			   ".byte ");
+	      ins->printf (ins, dis_style_immediate, "0x%x",
+			   (unsigned int) priv.the_buffer[0]);
 	    }
 
 	  return 1;
@@ -9478,10 +9546,9 @@ print_insn (bfd_vma pc, instr_info *ins)
       for (i = 0;
 	   i < (int) ARRAY_SIZE (ins->all_prefixes) && ins->all_prefixes[i];
 	   i++)
-	(*ins->info->fprintf_styled_func)
-	  (ins->info->stream, dis_style_mnemonic, "%s%s",
-	   (i == 0 ? "" : " "), prefix_name (ins, ins->all_prefixes[i],
-					     sizeflag));
+	ins->printf (ins, dis_style_mnemonic, "%s%s",
+		     (i == 0 ? "" : " "),
+		     prefix_name (ins, ins->all_prefixes[i], sizeflag));
       return i;
     }
 
@@ -9496,11 +9563,9 @@ print_insn (bfd_vma pc, instr_info *ins)
       /* Handle ins->prefixes before fwait.  */
       for (i = 0; i < ins->fwait_prefix && ins->all_prefixes[i];
 	   i++)
-	(*ins->info->fprintf_styled_func)
-	  (ins->info->stream, dis_style_mnemonic, "%s ",
-	   prefix_name (ins, ins->all_prefixes[i], sizeflag));
-      (*ins->info->fprintf_styled_func)
-	(ins->info->stream, dis_style_mnemonic, "fwait");
+	ins->printf (ins, dis_style_mnemonic, "%s ",
+		     prefix_name (ins, ins->all_prefixes[i], sizeflag));
+      ins->printf (ins, dis_style_mnemonic, "fwait");
       return i + 1;
     }
 
@@ -9569,14 +9634,15 @@ print_insn (bfd_vma pc, instr_info *ins)
 		  /* Don't print {%k0}.  */
 		  if (ins->vex.mask_register_specifier)
 		    {
-		      oappend (ins, "{");
+		      oappend (ins, "{", dis_style_text);
 		      oappend_maybe_intel (ins,
 					   att_names_mask
-					   [ins->vex.mask_register_specifier]);
-		      oappend (ins, "}");
+					   [ins->vex.mask_register_specifier],
+					   dis_style_text);
+		      oappend (ins, "}", dis_style_text);
 		    }
 		  if (ins->vex.zeroing)
-		    oappend (ins, "{z}");
+		    oappend (ins, "{z}", dis_style_text);
 
 		  /* S/G insns require a mask and don't allow
 		     zeroing-masking.  */
@@ -9584,7 +9650,7 @@ print_insn (bfd_vma pc, instr_info *ins)
 		       || dp->op[0].bytemode == vex_vsib_q_w_dq_mode)
 		      && (ins->vex.mask_register_specifier == 0
 			  || ins->vex.zeroing))
-		    oappend (ins, "/(bad)");
+		    oappend (ins, "/(bad)", dis_style_text);
 		}
 	    }
 
@@ -9598,8 +9664,8 @@ print_insn (bfd_vma pc, instr_info *ins)
 		  ins->obufp = ins->op_out[i];
 		  if (*ins->obufp)
 		    continue;
-		  oappend (ins, names_rounding[ins->vex.ll]);
-		  oappend (ins, "bad}");
+		  oappend (ins, names_rounding[ins->vex.ll], dis_style_text);
+		  oappend (ins, "bad}", dis_style_text);
 		  break;
 		}
 	    }
@@ -9649,16 +9715,14 @@ print_insn (bfd_vma pc, instr_info *ins)
      are all 0s in inverted form.  */
   if (ins->need_vex && ins->vex.register_specifier != 0)
     {
-      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
-					 "(bad)");
+      ins->printf (ins, dis_style_text, "(bad)");
       return ins->end_codep - priv.the_buffer;
     }
 
   /* If EVEX.z is set, there must be an actual mask register in use.  */
   if (ins->vex.zeroing && ins->vex.mask_register_specifier == 0)
     {
-      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
-					 "(bad)");
+      ins->printf (ins, dis_style_text, "(bad)");
       return ins->end_codep - priv.the_buffer;
     }
 
@@ -9669,8 +9733,7 @@ print_insn (bfd_vma pc, instr_info *ins)
 	 the encoding invalid.  Most other PREFIX_OPCODE rules still apply.  */
       if (ins->need_vex ? !ins->vex.prefix : !(ins->prefixes & PREFIX_DATA))
 	{
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, "(bad)");
+	  ins->printf (ins, dis_style_text, "(bad)");
 	  return ins->end_codep - priv.the_buffer;
 	}
       ins->used_prefixes |= PREFIX_DATA;
@@ -9697,8 +9760,7 @@ print_insn (bfd_vma pc, instr_info *ins)
 	  || (ins->vex.evex && dp->prefix_requirement != PREFIX_DATA
 	      && !ins->vex.w != !(ins->used_prefixes & PREFIX_DATA)))
 	{
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, "(bad)");
+	  ins->printf (ins, dis_style_text, "(bad)");
 	  return ins->end_codep - priv.the_buffer;
 	}
       break;
@@ -9748,24 +9810,28 @@ print_insn (bfd_vma pc, instr_info *ins)
 	if (name == NULL)
 	  abort ();
 	prefix_length += strlen (name) + 1;
-	(*ins->info->fprintf_styled_func)
-	  (ins->info->stream, dis_style_mnemonic, "%s ", name);
+	ins->printf (ins, dis_style_mnemonic, "%s ", name);
       }
 
   /* Check maximum code length.  */
   if ((ins->codep - ins->start_codep) > MAX_CODE_LENGTH)
     {
-      (*ins->info->fprintf_styled_func)
-	(ins->info->stream, dis_style_text, "(bad)");
+      ins->printf (ins, dis_style_text, "(bad)");
       return MAX_CODE_LENGTH;
     }
 
-  ins->obufp = ins->mnemonicendp;
-  for (i = strlen (ins->obuf) + prefix_length; i < 6; i++)
-    oappend (ins, " ");
-  oappend (ins, " ");
-  (*ins->info->fprintf_styled_func)
-    (ins->info->stream, dis_style_mnemonic, "%s", ins->obuf);
+  i = strlen (ins->obuf);
+  if (ins->mnemonicendp == ins->obuf + i)
+    {
+      i += prefix_length;
+      if (i < 6)
+	i = 6 - i + 1;
+      else
+	i = 1;
+    }
+  else
+    i = 0;
+  ins->printf (ins, dis_style_mnemonic, "%s%*s", ins->obuf, i, "");
 
   /* The enter and bound instructions are printed with operands in the same
      order as the intel book; everything else is printed in reverse order.  */
@@ -9804,8 +9870,7 @@ print_insn (bfd_vma pc, instr_info *ins)
     if (*op_txt[i])
       {
 	if (needcomma)
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, ",");
+	  ins->printf (ins, dis_style_text, ",");
 	if (ins->op_index[i] != -1 && !ins->op_riprel[i])
 	  {
 	    bfd_vma target = (bfd_vma) ins->op_address[ins->op_index[i]];
@@ -9821,18 +9886,14 @@ print_insn (bfd_vma pc, instr_info *ins)
 	    (*ins->info->print_address_func) (target, ins->info);
 	  }
 	else
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, "%s",
-					     op_txt[i]);
+	  ins->printf (ins, dis_style_text, "%s", op_txt[i]);
 	needcomma = 1;
       }
 
   for (i = 0; i < MAX_OPERANDS; i++)
     if (ins->op_index[i] != -1 && ins->op_riprel[i])
       {
-	(*ins->info->fprintf_styled_func) (ins->info->stream,
-					   dis_style_comment_start,
-					   "        # ");
+	ins->printf (ins, dis_style_comment_start, "        # ");
 	(*ins->info->print_address_func) ((bfd_vma)
 			(ins->start_pc + (ins->codep - ins->start_codep)
 			 + ins->op_address[ins->op_index[i]]), ins->info);
@@ -10217,15 +10278,18 @@ static void
 OP_ST (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
        int sizeflag ATTRIBUTE_UNUSED)
 {
-  oappend_maybe_intel (ins, "%st");
+  oappend_maybe_intel (ins, "%st", dis_style_text);
 }
 
 static void
 OP_STi (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
 	int sizeflag ATTRIBUTE_UNUSED)
 {
-  sprintf (ins->scratchbuf, "%%st(%d)", ins->modrm.rm);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_maybe_intel (ins, "%st", dis_style_text);
+  oappend (ins, "(", dis_style_text);
+  sprintf (ins->scratchbuf, "%d", ins->modrm.rm);
+  oappend (ins, ins->scratchbuf, dis_style_immediate);
+  oappend (ins, ")", dis_style_text);
 }
 
 /* Capital letters in template are macros.  */
@@ -10329,7 +10393,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
 		if (!ins->vex.evex || ins->vex.w)
 		  *ins->obufp++ = 'd';
 		else
-		  oappend (ins, "{bad}");
+		  oappend (ins, "{bad}", dis_style_text);
 		break;
 	      default:
 		abort ();
@@ -10424,7 +10488,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
 	      if (!ins->vex.w)
 		*ins->obufp++ = 'h';
 	      else
-		oappend (ins, "{bad}");
+		oappend (ins, "{bad}", dis_style_text);
 	    }
 	  else
 	    abort ();
@@ -10608,7 +10672,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
 	      if (!ins->vex.evex || !ins->vex.w)
 		*ins->obufp++ = 's';
 	      else
-		oappend (ins, "{bad}");
+		oappend (ins, "{bad}", dis_style_text);
 	      break;
 	    default:
 	      abort ();
@@ -10772,12 +10836,47 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
   return 0;
 }
 
+/* Add a style marker "~X~" to *INS->obufp that encodes STYLE.  This
+   assumes that the buffer pointed to by INS->obufp has space.  In the
+   style marker 'X' is replaced with a single hex character that represents
+   STYLE.  */
+
+static void
+oappend_style (instr_info *ins, enum disassembler_style style)
+{
+  int num = (int) style;
+
+  /* We currently assume that STYLE can be encoded as a single hex
+     character.  If more styles are added then this might start to fail,
+     and we'll need to expand this code.  */
+  if (num > 0xf)
+    abort ();
+
+  *ins->obufp++ = '~';
+  *ins->obufp++ = (num < 10 ? ('0' + num)
+		   : ((num < 16) ? ('a' + (num - 10)) : '0'));
+  *ins->obufp++ = '~';
+  *ins->obufp = '\0';
+}
+
 static void
-oappend (instr_info *ins, const char *s)
+oappend (instr_info *ins, const char *s, enum disassembler_style style)
 {
+  oappend_style (ins, style);
   ins->obufp = stpcpy (ins->obufp, s);
 }
 
+/* Add a single character C to the buffer pointer to by INS->obufp, marking
+   the style for the character as STYLE.  */
+
+static void
+oappend_char (instr_info *ins, const char c, enum disassembler_style style)
+{
+  oappend_style (ins, style);
+  *ins->obufp++ = c;
+  *ins->obufp = '\0';
+}
+
 static void
 append_seg (instr_info *ins)
 {
@@ -10789,33 +10888,34 @@ append_seg (instr_info *ins)
   switch (ins->active_seg_prefix)
     {
     case PREFIX_CS:
-      oappend_maybe_intel (ins, "%cs:");
+      oappend_maybe_intel (ins, "%cs", dis_style_register);
       break;
     case PREFIX_DS:
-      oappend_maybe_intel (ins, "%ds:");
+      oappend_maybe_intel (ins, "%ds", dis_style_register);
       break;
     case PREFIX_SS:
-      oappend_maybe_intel (ins, "%ss:");
+      oappend_maybe_intel (ins, "%ss", dis_style_register);
       break;
     case PREFIX_ES:
-      oappend_maybe_intel (ins, "%es:");
+      oappend_maybe_intel (ins, "%es", dis_style_register);
       break;
     case PREFIX_FS:
-      oappend_maybe_intel (ins, "%fs:");
+      oappend_maybe_intel (ins, "%fs", dis_style_register);
       break;
     case PREFIX_GS:
-      oappend_maybe_intel (ins, "%gs:");
+      oappend_maybe_intel (ins, "%gs", dis_style_register);
       break;
     default:
       break;
     }
+  oappend_char (ins, ':', dis_style_text);
 }
 
 static void
 OP_indirE (instr_info *ins, int bytemode, int sizeflag)
 {
   if (!ins->intel_syntax)
-    oappend (ins, "*");
+    oappend (ins, "*", dis_style_text);
   OP_E (ins, bytemode, sizeflag);
 }
 
@@ -10931,14 +11031,14 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
 	  case x_mode:
 	  case evex_half_bcst_xmmq_mode:
 	    if (ins->vex.w)
-	      oappend (ins, "QWORD PTR ");
+	      oappend (ins, "QWORD PTR ", dis_style_text);
 	    else
-	      oappend (ins, "DWORD PTR ");
+	      oappend (ins, "DWORD PTR ", dis_style_text);
 	    break;
 	  case xh_mode:
 	  case evex_half_bcst_xmmqh_mode:
 	  case evex_half_bcst_xmmqdh_mode:
-	    oappend (ins, "WORD PTR ");
+	    oappend (ins, "WORD PTR ", dis_style_text);
 	    break;
 	  default:
 	    ins->vex.no_broadcast = true;
@@ -10951,17 +11051,17 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
     case b_mode:
     case b_swap_mode:
     case db_mode:
-      oappend (ins, "BYTE PTR ");
+      oappend (ins, "BYTE PTR ", dis_style_text);
       break;
     case w_mode:
     case w_swap_mode:
     case dw_mode:
-      oappend (ins, "WORD PTR ");
+      oappend (ins, "WORD PTR ", dis_style_text);
       break;
     case indir_v_mode:
       if (ins->address_mode == mode_64bit && ins->isa64 == intel64)
 	{
-	  oappend (ins, "QWORD PTR ");
+	  oappend (ins, "QWORD PTR ", dis_style_text);
 	  break;
 	}
       /* Fall through.  */
@@ -10969,7 +11069,7 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
       if (ins->address_mode == mode_64bit && ((sizeflag & DFLAG)
 					      || (ins->rex & REX_W)))
 	{
-	  oappend (ins, "QWORD PTR ");
+	  oappend (ins, "QWORD PTR ", dis_style_text);
 	  break;
 	}
       /* Fall through.  */
@@ -10978,62 +11078,62 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
     case dq_mode:
       USED_REX (REX_W);
       if (ins->rex & REX_W)
-	oappend (ins, "QWORD PTR ");
+	oappend (ins, "QWORD PTR ", dis_style_text);
       else if (bytemode == dq_mode)
-	oappend (ins, "DWORD PTR ");
+	oappend (ins, "DWORD PTR ", dis_style_text);
       else
 	{
 	  if (sizeflag & DFLAG)
-	    oappend (ins, "DWORD PTR ");
+	    oappend (ins, "DWORD PTR ", dis_style_text);
 	  else
-	    oappend (ins, "WORD PTR ");
+	    oappend (ins, "WORD PTR ", dis_style_text);
 	  ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
 	}
       break;
     case z_mode:
       if ((ins->rex & REX_W) || (sizeflag & DFLAG))
 	*ins->obufp++ = 'D';
-      oappend (ins, "WORD PTR ");
+      oappend (ins, "WORD PTR ", dis_style_text);
       if (!(ins->rex & REX_W))
 	ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
       break;
     case a_mode:
       if (sizeflag & DFLAG)
-	oappend (ins, "QWORD PTR ");
+	oappend (ins, "QWORD PTR ", dis_style_text);
       else
-	oappend (ins, "DWORD PTR ");
+	oappend (ins, "DWORD PTR ", dis_style_text);
       ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
       break;
     case movsxd_mode:
       if (!(sizeflag & DFLAG) && ins->isa64 == intel64)
-	oappend (ins, "WORD PTR ");
+	oappend (ins, "WORD PTR ", dis_style_text);
       else
-	oappend (ins, "DWORD PTR ");
+	oappend (ins, "DWORD PTR ", dis_style_text);
       ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
       break;
     case d_mode:
     case d_swap_mode:
-      oappend (ins, "DWORD PTR ");
+      oappend (ins, "DWORD PTR ", dis_style_text);
       break;
     case q_mode:
     case q_swap_mode:
-      oappend (ins, "QWORD PTR ");
+      oappend (ins, "QWORD PTR ", dis_style_text);
       break;
     case m_mode:
       if (ins->address_mode == mode_64bit)
-	oappend (ins, "QWORD PTR ");
+	oappend (ins, "QWORD PTR ", dis_style_text);
       else
-	oappend (ins, "DWORD PTR ");
+	oappend (ins, "DWORD PTR ", dis_style_text);
       break;
     case f_mode:
       if (sizeflag & DFLAG)
-	oappend (ins, "FWORD PTR ");
+	oappend (ins, "FWORD PTR ", dis_style_text);
       else
-	oappend (ins, "DWORD PTR ");
+	oappend (ins, "DWORD PTR ", dis_style_text);
       ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
       break;
     case t_mode:
-      oappend (ins, "TBYTE PTR ");
+      oappend (ins, "TBYTE PTR ", dis_style_text);
       break;
     case x_mode:
     case xh_mode:
@@ -11046,26 +11146,26 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
 	  switch (ins->vex.length)
 	    {
 	    case 128:
-	      oappend (ins, "XMMWORD PTR ");
+	      oappend (ins, "XMMWORD PTR ", dis_style_text);
 	      break;
 	    case 256:
-	      oappend (ins, "YMMWORD PTR ");
+	      oappend (ins, "YMMWORD PTR ", dis_style_text);
 	      break;
 	    case 512:
-	      oappend (ins, "ZMMWORD PTR ");
+	      oappend (ins, "ZMMWORD PTR ", dis_style_text);
 	      break;
 	    default:
 	      abort ();
 	    }
 	}
       else
-	oappend (ins, "XMMWORD PTR ");
+	oappend (ins, "XMMWORD PTR ", dis_style_text);
       break;
     case xmm_mode:
-      oappend (ins, "XMMWORD PTR ");
+      oappend (ins, "XMMWORD PTR ", dis_style_text);
       break;
     case ymm_mode:
-      oappend (ins, "YMMWORD PTR ");
+      oappend (ins, "YMMWORD PTR ", dis_style_text);
       break;
     case xmmq_mode:
     case evex_half_bcst_xmmqh_mode:
@@ -11076,13 +11176,13 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
       switch (ins->vex.length)
 	{
 	case 128:
-	  oappend (ins, "QWORD PTR ");
+	  oappend (ins, "QWORD PTR ", dis_style_text);
 	  break;
 	case 256:
-	  oappend (ins, "XMMWORD PTR ");
+	  oappend (ins, "XMMWORD PTR ", dis_style_text);
 	  break;
 	case 512:
-	  oappend (ins, "YMMWORD PTR ");
+	  oappend (ins, "YMMWORD PTR ", dis_style_text);
 	  break;
 	default:
 	  abort ();
@@ -11095,13 +11195,13 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
       switch (ins->vex.length)
 	{
 	case 128:
-	  oappend (ins, "WORD PTR ");
+	  oappend (ins, "WORD PTR ", dis_style_text);
 	  break;
 	case 256:
-	  oappend (ins, "DWORD PTR ");
+	  oappend (ins, "DWORD PTR ", dis_style_text);
 	  break;
 	case 512:
-	  oappend (ins, "QWORD PTR ");
+	  oappend (ins, "QWORD PTR ", dis_style_text);
 	  break;
 	default:
 	  abort ();
@@ -11115,13 +11215,13 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
       switch (ins->vex.length)
 	{
 	case 128:
-	  oappend (ins, "DWORD PTR ");
+	  oappend (ins, "DWORD PTR ", dis_style_text);
 	  break;
 	case 256:
-	  oappend (ins, "QWORD PTR ");
+	  oappend (ins, "QWORD PTR ", dis_style_text);
 	  break;
 	case 512:
-	  oappend (ins, "XMMWORD PTR ");
+	  oappend (ins, "XMMWORD PTR ", dis_style_text);
 	  break;
 	default:
 	  abort ();
@@ -11134,45 +11234,45 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
       switch (ins->vex.length)
 	{
 	case 128:
-	  oappend (ins, "QWORD PTR ");
+	  oappend (ins, "QWORD PTR ", dis_style_text);
 	  break;
 	case 256:
-	  oappend (ins, "YMMWORD PTR ");
+	  oappend (ins, "YMMWORD PTR ", dis_style_text);
 	  break;
 	case 512:
-	  oappend (ins, "ZMMWORD PTR ");
+	  oappend (ins, "ZMMWORD PTR ", dis_style_text);
 	  break;
 	default:
 	  abort ();
 	}
       break;
     case o_mode:
-      oappend (ins, "OWORD PTR ");
+      oappend (ins, "OWORD PTR ", dis_style_text);
       break;
     case vex_vsib_d_w_dq_mode:
     case vex_vsib_q_w_dq_mode:
       if (!ins->need_vex)
 	abort ();
       if (ins->vex.w)
-	oappend (ins, "QWORD PTR ");
+	oappend (ins, "QWORD PTR ", dis_style_text);
       else
-	oappend (ins, "DWORD PTR ");
+	oappend (ins, "DWORD PTR ", dis_style_text);
       break;
     case mask_bd_mode:
       if (!ins->need_vex || ins->vex.length != 128)
 	abort ();
       if (ins->vex.w)
-	oappend (ins, "DWORD PTR ");
+	oappend (ins, "DWORD PTR ", dis_style_text);
       else
-	oappend (ins, "BYTE PTR ");
+	oappend (ins, "BYTE PTR ", dis_style_text);
       break;
     case mask_mode:
       if (!ins->need_vex)
 	abort ();
       if (ins->vex.w)
-	oappend (ins, "QWORD PTR ");
+	oappend (ins, "QWORD PTR ", dis_style_text);
       else
-	oappend (ins, "WORD PTR ");
+	oappend (ins, "WORD PTR ", dis_style_text);
       break;
     case v_bnd_mode:
     case v_bndmk_mode:
@@ -11221,7 +11321,7 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
     case bnd_swap_mode:
       if (reg > 0x3)
 	{
-	  oappend (ins, "(bad)");
+	  oappend (ins, "(bad)", dis_style_text);
 	  return;
 	}
       names = att_names_bnd;
@@ -11285,7 +11385,7 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
     case mask_mode:
       if (reg > 0x7)
 	{
-	  oappend (ins, "(bad)");
+	  oappend (ins, "(bad)", dis_style_text);
 	  return;
 	}
       names = att_names_mask;
@@ -11293,10 +11393,10 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
     case 0:
       return;
     default:
-      oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
+      oappend (ins, INTERNAL_DISASSEMBLER_ERROR, dis_style_text);
       return;
     }
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_maybe_intel (ins, names[reg], dis_style_register);
 }
 
 static void
@@ -11488,7 +11588,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	      || bytemode == vex_vsib_q_w_dq_mode
 	      || bytemode == vex_sibmem_mode)
 	    {
-	      oappend (ins, "(bad)");
+	      oappend (ins, "(bad)", dis_style_text);
 	      return;
 	    }
 	}
@@ -11505,7 +11605,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	      disp = get32s (ins);
 	      if (riprel && bytemode == v_bndmk_mode)
 		{
-		  oappend (ins, "(bad)");
+		  oappend (ins, "(bad)", dis_style_text);
 		  return;
 		}
 	    }
@@ -11560,11 +11660,14 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	      print_displacement (ins, ins->scratchbuf, disp);
 	    else
 	      print_operand_value (ins, ins->scratchbuf, 1, disp);
-	    oappend (ins, ins->scratchbuf);
+	    oappend (ins, ins->scratchbuf, dis_style_address_offset);
 	    if (riprel)
 	      {
 		set_op (ins, disp, 1);
-		oappend (ins, !addr32flag ? "(%rip)" : "(%eip)");
+		oappend_char (ins, '(', dis_style_text);
+		oappend (ins, !addr32flag ? "%rip" : "%eip",
+			 dis_style_register);
+		oappend_char (ins, ')', dis_style_text);
 	      }
 	  }
 
@@ -11578,17 +11681,17 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
       if (havedisp || (ins->intel_syntax && riprel))
 	{
-	  *ins->obufp++ = ins->open_char;
+	  oappend_char (ins, ins->open_char, dis_style_text);
 	  if (ins->intel_syntax && riprel)
 	    {
 	      set_op (ins, disp, 1);
-	      oappend (ins, !addr32flag ? "rip" : "eip");
+	      oappend (ins, !addr32flag ? "rip" : "eip", dis_style_register);
 	    }
-	  *ins->obufp = '\0';
 	  if (havebase)
 	    oappend_maybe_intel (ins,
 				 (ins->address_mode == mode_64bit && !addr32flag
-				  ? att_names64 : att_names32)[rbase]);
+				  ? att_names64 : att_names32)[rbase],
+				 dis_style_register);
 	  if (ins->has_sib)
 	    {
 	      /* ESP/RSP won't allow index.  If base isn't ESP/RSP,
@@ -11599,41 +11702,34 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 		  || (havebase && base != ESP_REG_NUM))
 		{
 		  if (!ins->intel_syntax || havebase)
-		    {
-		      *ins->obufp++ = ins->separator_char;
-		      *ins->obufp = '\0';
-		    }
+		    oappend_char (ins, ins->separator_char, dis_style_text);
 		  if (indexes)
 		    {
 		      if (ins->address_mode == mode_64bit || vindex < 16)
-			oappend_maybe_intel (ins, indexes[vindex]);
+			oappend_maybe_intel (ins, indexes[vindex],
+					     dis_style_register);
 		      else
-			oappend (ins, "(bad)");
+			oappend (ins, "(bad)", dis_style_text);
 		    }
 		  else
 		    oappend_maybe_intel (ins,
 					 ins->address_mode == mode_64bit
 					 && !addr32flag ? att_index64
-							: att_index32);
+					 : att_index32, dis_style_text);
 
-		  *ins->obufp++ = ins->scale_char;
-		  *ins->obufp = '\0';
+		  oappend_char (ins, ins->scale_char, dis_style_text);
 		  sprintf (ins->scratchbuf, "%d", 1 << scale);
-		  oappend (ins, ins->scratchbuf);
+		  oappend (ins, ins->scratchbuf, dis_style_immediate);
 		}
 	    }
 	  if (ins->intel_syntax
 	      && (disp || ins->modrm.mod != 0 || base == 5))
 	    {
 	      if (!havedisp || (bfd_signed_vma) disp >= 0)
-		{
-		  *ins->obufp++ = '+';
-		  *ins->obufp = '\0';
-		}
+		  oappend_char (ins, '+', dis_style_text);
 	      else if (ins->modrm.mod != 1 && disp != -disp)
 		{
-		  *ins->obufp++ = '-';
-		  *ins->obufp = '\0';
+		  oappend_char (ins, '-', dis_style_text);
 		  disp = - (bfd_signed_vma) disp;
 		}
 
@@ -11641,11 +11737,10 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 		print_displacement (ins, ins->scratchbuf, disp);
 	      else
 		print_operand_value (ins, ins->scratchbuf, 1, disp);
-	      oappend (ins, ins->scratchbuf);
+	      oappend (ins, ins->scratchbuf, dis_style_text);
 	    }
 
-	  *ins->obufp++ = ins->close_char;
-	  *ins->obufp = '\0';
+	  oappend_char (ins, ins->close_char, dis_style_text);
 
 	  if (check_gather)
 	    {
@@ -11657,7 +11752,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	      if (!ins->vex.r)
 	        modrm_reg += 16;
 	      if (vindex == modrm_reg)
-		oappend (ins, "/(bad)");
+		oappend (ins, "/(bad)", dis_style_text);
 	    }
 	}
       else if (ins->intel_syntax)
@@ -11666,11 +11761,12 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	    {
 	      if (!ins->active_seg_prefix)
 		{
-		  oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
-		  oappend (ins, ":");
+		  oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg],
+				       dis_style_text);
+		  oappend (ins, ":", dis_style_text);
 		}
 	      print_operand_value (ins, ins->scratchbuf, 1, disp);
-	      oappend (ins, ins->scratchbuf);
+	      oappend (ins, ins->scratchbuf, dis_style_text);
 	    }
 	}
     }
@@ -11681,7 +11777,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	   || bytemode == vex_vsib_d_w_dq_mode
 	   || bytemode == vex_vsib_q_w_dq_mode)
     {
-      oappend (ins, "(bad)");
+      oappend (ins, "(bad)", dis_style_text);
       return;
     }
   else
@@ -11717,47 +11813,42 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	if (ins->modrm.mod != 0 || ins->modrm.rm == 6)
 	  {
 	    print_displacement (ins, ins->scratchbuf, disp);
-	    oappend (ins, ins->scratchbuf);
+	    oappend (ins, ins->scratchbuf, dis_style_text);
 	  }
 
       if (ins->modrm.mod != 0 || ins->modrm.rm != 6)
 	{
-	  *ins->obufp++ = ins->open_char;
-	  *ins->obufp = '\0';
+	  oappend_char (ins, ins->open_char, dis_style_text);
 	  oappend (ins,
 		   (ins->intel_syntax ? intel_index16
-				      : att_index16)[ins->modrm.rm]);
+		    : att_index16)[ins->modrm.rm], dis_style_text);
 	  if (ins->intel_syntax
 	      && (disp || ins->modrm.mod != 0 || ins->modrm.rm == 6))
 	    {
 	      if ((bfd_signed_vma) disp >= 0)
-		{
-		  *ins->obufp++ = '+';
-		  *ins->obufp = '\0';
-		}
+		oappend_char (ins, '+', dis_style_text);
 	      else if (ins->modrm.mod != 1)
 		{
-		  *ins->obufp++ = '-';
-		  *ins->obufp = '\0';
+		  oappend_char (ins, '-', dis_style_text);
 		  disp = - (bfd_signed_vma) disp;
 		}
 
 	      print_displacement (ins, ins->scratchbuf, disp);
-	      oappend (ins, ins->scratchbuf);
+	      oappend (ins, ins->scratchbuf, dis_style_text);
 	    }
 
-	  *ins->obufp++ = ins->close_char;
-	  *ins->obufp = '\0';
+	  oappend_char (ins, ins->close_char, dis_style_text);
 	}
       else if (ins->intel_syntax)
 	{
 	  if (!ins->active_seg_prefix)
 	    {
-	      oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
-	      oappend (ins, ":");
+	      oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg],
+				   dis_style_text);
+	      oappend (ins, ":", dis_style_text);
 	    }
 	  print_operand_value (ins, ins->scratchbuf, 1, disp & 0xffff);
-	  oappend (ins, ins->scratchbuf);
+	  oappend (ins, ins->scratchbuf, dis_style_text);
 	}
     }
   if (ins->vex.b)
@@ -11773,19 +11864,19 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	  if (bytemode == xh_mode)
 	    {
 	      if (ins->vex.w)
-		oappend (ins, "{bad}");
+		oappend (ins, "{bad}", dis_style_text);
 	      else
 		{
 		  switch (ins->vex.length)
 		    {
 		    case 128:
-		      oappend (ins, "{1to8}");
+		      oappend (ins, "{1to8}", dis_style_text);
 		      break;
 		    case 256:
-		      oappend (ins, "{1to16}");
+		      oappend (ins, "{1to16}", dis_style_text);
 		      break;
 		    case 512:
-		      oappend (ins, "{1to32}");
+		      oappend (ins, "{1to32}", dis_style_text);
 		      break;
 		    default:
 		      abort ();
@@ -11802,13 +11893,13 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	      switch (ins->vex.length)
 		{
 		case 128:
-		  oappend (ins, "{1to2}");
+		  oappend (ins, "{1to2}", dis_style_text);
 		  break;
 		case 256:
-		  oappend (ins, "{1to4}");
+		  oappend (ins, "{1to4}", dis_style_text);
 		  break;
 		case 512:
-		  oappend (ins, "{1to8}");
+		  oappend (ins, "{1to8}", dis_style_text);
 		  break;
 		default:
 		  abort ();
@@ -11820,13 +11911,13 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	      switch (ins->vex.length)
 		{
 		case 128:
-		  oappend (ins, "{1to4}");
+		  oappend (ins, "{1to4}", dis_style_text);
 		  break;
 		case 256:
-		  oappend (ins, "{1to8}");
+		  oappend (ins, "{1to8}", dis_style_text);
 		  break;
 		case 512:
-		  oappend (ins, "{1to16}");
+		  oappend (ins, "{1to16}", dis_style_text);
 		  break;
 		default:
 		  abort ();
@@ -11836,7 +11927,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	    ins->vex.no_broadcast = true;
 	}
       if (ins->vex.no_broadcast)
-	oappend (ins, "{bad}");
+	oappend (ins, "{bad}", dis_style_text);
     }
 }
 
@@ -11866,7 +11957,7 @@ OP_G (instr_info *ins, int bytemode, int sizeflag)
 {
   if (ins->vex.evex && !ins->vex.r && ins->address_mode == mode_64bit)
     {
-      oappend (ins, "(bad)");
+      oappend (ins, "(bad)", dis_style_text);
       return;
     }
 
@@ -11969,7 +12060,7 @@ OP_REG (instr_info *ins, int code, int sizeflag)
     {
     case es_reg: case ss_reg: case cs_reg:
     case ds_reg: case fs_reg: case gs_reg:
-      oappend_maybe_intel (ins, att_names_seg[code - es_reg]);
+      oappend_maybe_intel (ins, att_names_seg[code - es_reg], dis_style_text);
       return;
     }
 
@@ -12019,10 +12110,10 @@ OP_REG (instr_info *ins, int code, int sizeflag)
 	}
       break;
     default:
-      oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
+      oappend (ins, INTERNAL_DISASSEMBLER_ERROR, dis_style_text);
       return;
     }
-  oappend_maybe_intel (ins, s);
+  oappend_maybe_intel (ins, s, dis_style_register);
 }
 
 static void
@@ -12035,7 +12126,7 @@ OP_IMREG (instr_info *ins, int code, int sizeflag)
     case indir_dx_reg:
       if (!ins->intel_syntax)
 	{
-	  oappend (ins, "(%dx)");
+	  oappend (ins, "(%dx)", dis_style_text);
 	  return;
 	}
       s = att_names16[dx_reg - ax_reg];
@@ -12060,10 +12151,10 @@ OP_IMREG (instr_info *ins, int code, int sizeflag)
 	ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
       break;
     default:
-      oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
+      oappend (ins, INTERNAL_DISASSEMBLER_ERROR, dis_style_text);
       return;
     }
-  oappend_maybe_intel (ins, s);
+  oappend_maybe_intel (ins, s, dis_style_register);
 }
 
 static void
@@ -12108,17 +12199,17 @@ OP_I (instr_info *ins, int bytemode, int sizeflag)
       break;
     case const_1_mode:
       if (ins->intel_syntax)
-	oappend (ins, "1");
+	oappend (ins, "1", dis_style_text);
       return;
     default:
-      oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
+      oappend (ins, INTERNAL_DISASSEMBLER_ERROR, dis_style_text);
       return;
     }
 
   op &= mask;
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, op);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_maybe_intel (ins, ins->scratchbuf, dis_style_immediate);
   ins->scratchbuf[0] = '\0';
 }
 
@@ -12136,7 +12227,7 @@ OP_I64 (instr_info *ins, int bytemode, int sizeflag)
 
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, get64 (ins));
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_maybe_intel (ins, ins->scratchbuf, dis_style_immediate);
   ins->scratchbuf[0] = '\0';
 }
 
@@ -12184,13 +12275,13 @@ OP_sI (instr_info *ins, int bytemode, int sizeflag)
 	op = get16 (ins);
       break;
     default:
-      oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
+      oappend (ins, INTERNAL_DISASSEMBLER_ERROR, dis_style_text);
       return;
     }
 
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, op);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_maybe_intel (ins, ins->scratchbuf, dis_style_immediate);
 }
 
 static void
@@ -12234,21 +12325,21 @@ OP_J (instr_info *ins, int bytemode, int sizeflag)
 	ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
       break;
     default:
-      oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
+      oappend (ins, INTERNAL_DISASSEMBLER_ERROR, dis_style_text);
       return;
     }
   disp = ((ins->start_pc + (ins->codep - ins->start_codep) + disp) & mask)
 	 | segment;
   set_op (ins, disp, 0);
   print_operand_value (ins, ins->scratchbuf, 1, disp);
-  oappend (ins, ins->scratchbuf);
+  oappend (ins, ins->scratchbuf, dis_style_text);
 }
 
 static void
 OP_SEG (instr_info *ins, int bytemode, int sizeflag)
 {
   if (bytemode == w_mode)
-    oappend_maybe_intel (ins, att_names_seg[ins->modrm.reg]);
+    oappend_maybe_intel (ins, att_names_seg[ins->modrm.reg], dis_style_text);
   else
     OP_E (ins, ins->modrm.mod == 3 ? bytemode : w_mode, sizeflag);
 }
@@ -12273,7 +12364,7 @@ OP_DIR (instr_info *ins, int dummy ATTRIBUTE_UNUSED, int sizeflag)
     sprintf (ins->scratchbuf, "0x%x:0x%x", seg, offset);
   else
     sprintf (ins->scratchbuf, "$0x%x,$0x%x", seg, offset);
-  oappend (ins, ins->scratchbuf);
+  oappend (ins, ins->scratchbuf, dis_style_text);
 }
 
 static void
@@ -12294,12 +12385,13 @@ OP_OFF (instr_info *ins, int bytemode, int sizeflag)
     {
       if (!ins->active_seg_prefix)
 	{
-	  oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
-	  oappend (ins, ":");
+	  oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg],
+			       dis_style_register);
+	  oappend (ins, ":", dis_style_text);
 	}
     }
   print_operand_value (ins, ins->scratchbuf, 1, off);
-  oappend (ins, ins->scratchbuf);
+  oappend (ins, ins->scratchbuf, dis_style_address_offset);
 }
 
 static void
@@ -12324,12 +12416,13 @@ OP_OFF64 (instr_info *ins, int bytemode, int sizeflag)
     {
       if (!ins->active_seg_prefix)
 	{
-	  oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
-	  oappend (ins, ":");
+	  oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg],
+			       dis_style_text);
+	  oappend (ins, ":", dis_style_text);
 	}
     }
   print_operand_value (ins, ins->scratchbuf, 1, off);
-  oappend (ins, ins->scratchbuf);
+  oappend (ins, ins->scratchbuf, dis_style_address_offset);
 }
 
 static void
@@ -12350,9 +12443,8 @@ ptr_reg (instr_info *ins, int code, int sizeflag)
     s = att_names32[code - eAX_reg];
   else
     s = att_names16[code - eAX_reg];
-  oappend_maybe_intel (ins, s);
-  *ins->obufp++ = ins->close_char;
-  *ins->obufp = 0;
+  oappend_maybe_intel (ins, s, dis_style_register);
+  oappend_char (ins, ins->close_char, dis_style_text);
 }
 
 static void
@@ -12375,7 +12467,8 @@ OP_ESreg (instr_info *ins, int code, int sizeflag)
 	  intel_operand_size (ins, b_mode, sizeflag);
 	}
     }
-  oappend_maybe_intel (ins, "%es:");
+  oappend_maybe_intel (ins, "%es", dis_style_register);
+  oappend_char (ins, ':', dis_style_text);
   ptr_reg (ins, code, sizeflag);
 }
 
@@ -12425,7 +12518,7 @@ OP_C (instr_info *ins, int dummy ATTRIBUTE_UNUSED,
   else
     add = 0;
   sprintf (ins->scratchbuf, "%%cr%d", ins->modrm.reg + add);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_maybe_intel (ins, ins->scratchbuf, dis_style_text);
 }
 
 static void
@@ -12442,7 +12535,7 @@ OP_D (instr_info *ins, int dummy ATTRIBUTE_UNUSED,
     sprintf (ins->scratchbuf, "dr%d", ins->modrm.reg + add);
   else
     sprintf (ins->scratchbuf, "%%db%d", ins->modrm.reg + add);
-  oappend (ins, ins->scratchbuf);
+  oappend (ins, ins->scratchbuf, dis_style_text);
 }
 
 static void
@@ -12450,7 +12543,7 @@ OP_T (instr_info *ins, int dummy ATTRIBUTE_UNUSED,
       int sizeflag ATTRIBUTE_UNUSED)
 {
   sprintf (ins->scratchbuf, "%%tr%d", ins->modrm.reg);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_maybe_intel (ins, ins->scratchbuf, dis_style_text);
 }
 
 static void
@@ -12470,7 +12563,7 @@ OP_MMX (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
     }
   else
     names = att_names_mm;
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_maybe_intel (ins, names[reg], dis_style_register);
 }
 
 static void
@@ -12501,7 +12594,7 @@ print_vector_reg (instr_info *ins, unsigned int reg, int bytemode)
     {
       if (reg >= 8)
 	{
-	  oappend (ins, "(bad)");
+	  oappend (ins, "(bad)", dis_style_text);
 	  return;
 	}
       names = att_names_tmm;
@@ -12543,7 +12636,7 @@ print_vector_reg (instr_info *ins, unsigned int reg, int bytemode)
     }
   else
     names = att_names_xmm;
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_maybe_intel (ins, names[reg], dis_style_register);
 }
 
 static void
@@ -12603,7 +12696,7 @@ OP_EM (instr_info *ins, int bytemode, int sizeflag)
     }
   else
     names = att_names_mm;
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_maybe_intel (ins, names[reg], dis_style_register);
 }
 
 /* cvt* are the only instructions in sse2 which have
@@ -12629,7 +12722,7 @@ OP_EMC (instr_info *ins, int bytemode, int sizeflag)
   MODRM_CHECK;
   ins->codep++;
   ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
-  oappend_maybe_intel (ins, att_names_mm[ins->modrm.rm]);
+  oappend_maybe_intel (ins, att_names_mm[ins->modrm.rm], dis_style_register);
 }
 
 static void
@@ -12637,7 +12730,7 @@ OP_MXC (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
 	int sizeflag ATTRIBUTE_UNUSED)
 {
   ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
-  oappend_maybe_intel (ins, att_names_mm[ins->modrm.reg]);
+  oappend_maybe_intel (ins, att_names_mm[ins->modrm.reg], dis_style_text);
 }
 
 static void
@@ -12813,7 +12906,7 @@ OP_3DNowSuffix (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
   ins->obufp = ins->mnemonicendp;
   mnemonic = Suffix3DNow[*ins->codep++ & 0xff];
   if (mnemonic)
-    oappend (ins, mnemonic);
+    ins->obufp = stpcpy (ins->obufp, mnemonic);
   else
     {
       /* Since a variable sized ins->modrm/ins->sib chunk is between the start
@@ -12902,7 +12995,7 @@ CMP_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
       /* We have a reserved extension byte.  Output it directly.  */
       ins->scratchbuf[0] = '$';
       print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
-      oappend_maybe_intel (ins, ins->scratchbuf);
+      oappend_maybe_intel (ins, ins->scratchbuf, dis_style_text);
       ins->scratchbuf[0] = '\0';
     }
 }
@@ -12959,7 +13052,7 @@ BadOp (instr_info *ins)
 {
   /* Throw away prefixes and 1st. opcode byte.  */
   ins->codep = ins->insn_codep + 1;
-  oappend (ins, "(bad)");
+  ins->obufp = stpcpy (ins->obufp, "(bad)");
 }
 
 static void
@@ -13123,7 +13216,7 @@ XMM_Fixup (instr_info *ins, int reg, int sizeflag ATTRIBUTE_UNUSED)
 	  abort ();
 	}
     }
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_maybe_intel (ins, names[reg], dis_style_text);
 }
 
 static void
@@ -13160,7 +13253,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
     {
       if (ins->vex.evex && !ins->vex.v)
 	{
-	  oappend (ins, "(bad)");
+	  oappend (ins, "(bad)", dis_style_text);
 	  return;
 	}
 
@@ -13172,7 +13265,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
   switch (bytemode)
     {
     case scalar_mode:
-      oappend_maybe_intel (ins, att_names_xmm[reg]);
+      oappend_maybe_intel (ins, att_names_xmm[reg], dis_style_text);
       return;
 
     case vex_vsib_d_w_dq_mode:
@@ -13183,9 +13276,9 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       if (ins->vex.length == 128
 	  || (bytemode != vex_vsib_d_w_dq_mode
 	      && !ins->vex.w))
-	oappend_maybe_intel (ins, att_names_xmm[reg]);
+	oappend_maybe_intel (ins, att_names_xmm[reg], dis_style_text);
       else
-	oappend_maybe_intel (ins, att_names_ymm[reg]);
+	oappend_maybe_intel (ins, att_names_ymm[reg], dis_style_text);
 
       /* All 3 XMM/YMM registers must be distinct.  */
       modrm_reg = ins->modrm.reg;
@@ -13211,13 +13304,13 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
     case tmm_mode:
       /* All 3 TMM registers must be distinct.  */
       if (reg >= 8)
-	oappend (ins, "(bad)");
+	oappend (ins, "(bad)", dis_style_text);
       else
 	{
 	  /* This must be the 3rd operand.  */
 	  if (ins->obufp != ins->op_out[2])
 	    abort ();
-	  oappend_maybe_intel (ins, att_names_tmm[reg]);
+	  oappend_maybe_intel (ins, att_names_tmm[reg], dis_style_text);
 	  if (reg == ins->modrm.reg || reg == ins->modrm.rm)
 	    strcpy (ins->obufp, "/(bad)");
 	}
@@ -13254,7 +13347,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	case mask_mode:
 	  if (reg > 0x7)
 	    {
-	      oappend (ins, "(bad)");
+	      oappend (ins, "(bad)", dis_style_text);
 	      return;
 	    }
 	  names = att_names_mask;
@@ -13274,14 +13367,14 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	case mask_mode:
 	  if (reg > 0x7)
 	    {
-	      oappend (ins, "(bad)");
+	      oappend (ins, "(bad)", dis_style_text);
 	      return;
 	    }
 	  names = att_names_mask;
 	  break;
 	default:
 	  /* See PR binutils/20893 for a reproducer.  */
-	  oappend (ins, "(bad)");
+	  oappend (ins, "(bad)", dis_style_text);
 	  return;
 	}
       break;
@@ -13292,7 +13385,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       abort ();
       break;
     }
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_maybe_intel (ins, names[reg], dis_style_register);
 }
 
 static void
@@ -13335,7 +13428,7 @@ OP_REG_VexI4 (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
   if (bytemode == x_mode && ins->vex.length == 256)
     names = att_names_ymm;
 
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_maybe_intel (ins, names[reg], dis_style_text);
 
   if (ins->vex.w)
     {
@@ -13352,7 +13445,7 @@ OP_VexI4 (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
 {
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, ins->codep[-1] & 0xf);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_maybe_intel (ins, ins->scratchbuf, dis_style_text);
 }
 
 static void
@@ -13397,7 +13490,7 @@ VPCMP_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
       /* We have a reserved extension byte.  Output it directly.  */
       ins->scratchbuf[0] = '$';
       print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
-      oappend_maybe_intel (ins, ins->scratchbuf);
+      oappend_maybe_intel (ins, ins->scratchbuf, dis_style_text);
       ins->scratchbuf[0] = '\0';
     }
 }
@@ -13449,7 +13542,7 @@ VPCOM_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
       /* We have a reserved extension byte.  Output it directly.  */
       ins->scratchbuf[0] = '$';
       print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
-      oappend_maybe_intel (ins, ins->scratchbuf);
+      oappend_maybe_intel (ins, ins->scratchbuf, dis_style_text);
       ins->scratchbuf[0] = '\0';
     }
 }
@@ -13497,7 +13590,7 @@ PCLMUL_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
       /* We have a reserved extension byte.  Output it directly.  */
       ins->scratchbuf[0] = '$';
       print_operand_value (ins, ins->scratchbuf + 1, 1, pclmul_type);
-      oappend_maybe_intel (ins, ins->scratchbuf);
+      oappend_maybe_intel (ins, ins->scratchbuf, dis_style_immediate);
       ins->scratchbuf[0] = '\0';
     }
 }
@@ -13526,7 +13619,7 @@ MOVSXD_Fixup (instr_info *ins, int bytemode, int sizeflag)
       *p++ = 'd';
       break;
     default:
-      oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
+      oappend (ins, INTERNAL_DISASSEMBLER_ERROR, dis_style_text);
       break;
     }
 
@@ -13569,7 +13662,7 @@ DistinctDest_Fixup (instr_info *ins, int bytemode, int sizeflag)
       || (ins->modrm.mod == 3
 	  && modrm_reg == modrm_rm))
     {
-      oappend (ins, "(bad)");
+      oappend (ins, "(bad)", dis_style_text);
     }
   else
     OP_XMM (ins, bytemode, sizeflag);
@@ -13589,14 +13682,14 @@ OP_Rounding (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       /* Fall through.  */
     case evex_rounding_mode:
       ins->evex_used |= EVEX_b_used;
-      oappend (ins, names_rounding[ins->vex.ll]);
+      oappend (ins, names_rounding[ins->vex.ll], dis_style_text);
       break;
     case evex_sae_mode:
       ins->evex_used |= EVEX_b_used;
-      oappend (ins, "{");
+      oappend (ins, "{", dis_style_text);
       break;
     default:
       abort ();
     }
-  oappend (ins, "sae}");
+  oappend (ins, "sae}", dis_style_text);
 }
-- 
2.25.4


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
  2022-04-29 13:42 ` [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler Andrew Burgess
@ 2022-04-29 18:16   ` Vladimir Mezentsev
  2022-05-03 13:15     ` Andrew Burgess
  2022-04-29 18:57   ` H.J. Lu
  2022-05-02  7:28   ` Jan Beulich
  2 siblings, 1 reply; 29+ messages in thread
From: Vladimir Mezentsev @ 2022-04-29 18:16 UTC (permalink / raw)
  To: binutils



On 4/29/22 06:42, Andrew Burgess via Binutils wrote:
> The i386 disassembler is pretty complex.  Most disassembly is done
> indirectly; operands are built into buffers within a struct instr_info
> instance, before finally being printed later in the disassembly
> process.
>
> Sometimes the operand buffers are built in a different order to the
> order in which they will eventually be printed.
>
> Each operand can contain multiple components, e.g. multiple registers,
> immediates, other textual elements (commas, brackets, etc).
>
> When looking for how to apply styling I guess the ideal solution would
> be to move away from the operands being a single string that is built
> up, and instead have each operand be a list of "parts", where each
> part is some text and a style.  Then, when we eventually print the
> operand we would loop over the parts and print each part with the
> correct style.
>
> But it feels like a huge amount of work to move from where we are
> now to that potentially ideal solution.  Plus, the above solution
> would be pretty complex.
>
> So, instead I propose a .... different solution here, one that works
> with the existing infrastructure.
>
> As each operand is built up, piece be piece, we pass through style
> information.  This style information is then encoded into the operand
> buffer (see below for details).  After this the code can continue to
> operate as it does right now in order to manage the set of operand
> buffers.
>
> Then, as each operand is printed we can split the operand buffer into
> chunks at the style marker boundaries, with each chunk being printed
> in the correct style.
>
> For encoding the style information I use the format "~%x~".  As far as
> I can tell the '~' is not otherwise used in the i386 disassembler, so
> this should serve as a unique marker.  To speed up writing and then
> reading the style markers, I take advantage of the fact that there are
> less than 16 styles so I know the '%x' will only ever be a single hex
> character.
>
> In some (not very scientific) benchmarking on my machine,
> disassembling a reasonably large (142M) shared library, I'm not seeing
> any significant slow down in disassembler speed with this change.
>
> Most instructions are now being fully syntax highlighted when I
> disassemble using the --disassembler-color=extended-color option.  I'm
> sure that there are probably still a few corner cases that need fixing
> up, but we can come back to them later I think.
>
> When disassembler syntax highlighting is not being used, then there
> should be no user visible changes after this commit.
> ---
>   opcodes/i386-dis.c | 571 ++++++++++++++++++++++++++-------------------
>   1 file changed, 332 insertions(+), 239 deletions(-)
>
> diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
> index 1e3266329c1..c94d316a03f 100644
> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
> @@ -42,12 +42,14 @@
>   #include <setjmp.h>
>   typedef struct instr_info instr_info;
>   
> +#define STYLE_BUFFER_SIZE 10
> +
>   static int print_insn (bfd_vma, instr_info *);
>   static void dofloat (instr_info *, int);
>   static void OP_ST (instr_info *, int, int);
>   static void OP_STi (instr_info *, int, int);
>   static int putop (instr_info *, const char *, int);
> -static void oappend (instr_info *, const char *);
> +static void oappend (instr_info *, const char *, enum disassembler_style);
>   static void append_seg (instr_info *);
>   static void OP_indirE (instr_info *, int, int);
>   static void print_operand_value (instr_info *, char *, int, bfd_vma);
> @@ -166,6 +168,8 @@ struct instr_info
>     char *obufp;
>     char *mnemonicendp;
>     char scratchbuf[100];
> +  char style_buffer[STYLE_BUFFER_SIZE];

I don't see where  style_buffer is used.
It looks like style_buffer and  STYLE_BUFFER_SIZE are not needed.

> +  char staging_area[100];

  staging_area is used only in i386_dis_printf().
Why this is not a local array inside i386_dis_printf() ?


>     unsigned char *start_codep;
>     unsigned char *insn_codep;
>     unsigned char *codep;
> @@ -248,6 +252,8 @@ struct instr_info
>   
>     enum x86_64_isa isa64;
>   
> +  int (*printf) (instr_info *ins, enum disassembler_style style,
> +		 const char *fmt, ...) ATTRIBUTE_FPTR_PRINTF_3;
>   };
>   
>   /* Mark parts used in the REX prefix.  When we are testing for
> @@ -9300,9 +9306,73 @@ get_sib (instr_info *ins, int sizeflag)
>   /* Like oappend (below), but S is a string starting with '%'.
>      In Intel syntax, the '%' is elided.  */
>   static void
> -oappend_maybe_intel (instr_info *ins, const char *s)
> +oappend_maybe_intel (instr_info *ins, const char *s,
> +		     enum disassembler_style style)
>   {
> -  oappend (ins, s + ins->intel_syntax);
> +  oappend (ins, s + ins->intel_syntax, style);
> +}
> +
> +/* Wrap around a call to INS->info->fprintf_styled_func, printing FMT.
> +   STYLE is the default style to use in the fprintf_styled_func calls,
> +   however, FMT might include embedded style markers (see oappend_style),
> +   these embedded markers are not printed, but instead change the style
> +   used in the next fprintf_styled_func call.
> +
> +   Return non-zero to indicate the print call was a success.  */
> +
> +static int ATTRIBUTE_PRINTF_3
> +i386_dis_printf (instr_info *ins, enum disassembler_style style,
> +		 const char *fmt, ...)
> +{
> +  va_list ap;
> +  enum disassembler_style curr_style = style;
> +  char *start, *curr;
> +
> +  va_start (ap, fmt);
> +  vsnprintf (ins->staging_area, 100, fmt, ap);

Maybe sizeof (ins->staging_area) instead of 100 is better.

As I wrote above,  staging_area  can be declared inside i386_dis_printf.


-Vladimir



^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
  2022-04-29 13:42 ` [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler Andrew Burgess
  2022-04-29 18:16   ` Vladimir Mezentsev
@ 2022-04-29 18:57   ` H.J. Lu
  2022-05-03 13:14     ` Andrew Burgess
  2022-05-02  7:28   ` Jan Beulich
  2 siblings, 1 reply; 29+ messages in thread
From: H.J. Lu @ 2022-04-29 18:57 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: Binutils

.
 w On Fri, Apr 29, 2022 at 6:48 AM Andrew Burgess via Binutils
<binutils@sourceware.org> wrote:
>
> The i386 disassembler is pretty complex.  Most disassembly is done
> indirectly; operands are built into buffers within a struct instr_info
> instance, before finally being printed later in the disassembly
> process.
>
> Sometimes the operand buffers are built in a different order to the
> order in which they will eventually be printed.
>
> Each operand can contain multiple components, e.g. multiple registers,
> immediates, other textual elements (commas, brackets, etc).
>
> When looking for how to apply styling I guess the ideal solution would
> be to move away from the operands being a single string that is built
> up, and instead have each operand be a list of "parts", where each
> part is some text and a style.  Then, when we eventually print the
> operand we would loop over the parts and print each part with the
> correct style.
>
> But it feels like a huge amount of work to move from where we are
> now to that potentially ideal solution.  Plus, the above solution
> would be pretty complex.
>
> So, instead I propose a .... different solution here, one that works
> with the existing infrastructure.
>
> As each operand is built up, piece be piece, we pass through style
> information.  This style information is then encoded into the operand
> buffer (see below for details).  After this the code can continue to
> operate as it does right now in order to manage the set of operand
> buffers.
>
> Then, as each operand is printed we can split the operand buffer into
> chunks at the style marker boundaries, with each chunk being printed
> in the correct style.
>
> For encoding the style information I use the format "~%x~".  As far as
> I can tell the '~' is not otherwise used in the i386 disassembler, so

Can you use a non-ascii character instead of ~?

> this should serve as a unique marker.  To speed up writing and then
> reading the style markers, I take advantage of the fact that there are
> less than 16 styles so I know the '%x' will only ever be a single hex
> character.
>
> In some (not very scientific) benchmarking on my machine,
> disassembling a reasonably large (142M) shared library, I'm not seeing
> any significant slow down in disassembler speed with this change.
>
> Most instructions are now being fully syntax highlighted when I
> disassemble using the --disassembler-color=extended-color option.  I'm
> sure that there are probably still a few corner cases that need fixing
> up, but we can come back to them later I think.
>
> When disassembler syntax highlighting is not being used, then there
> should be no user visible changes after this commit.
> ---
>  opcodes/i386-dis.c | 571 ++++++++++++++++++++++++++-------------------
>  1 file changed, 332 insertions(+), 239 deletions(-)
>
> diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
> index 1e3266329c1..c94d316a03f 100644
> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
> @@ -42,12 +42,14 @@
>  #include <setjmp.h>
>  typedef struct instr_info instr_info;
>
> +#define STYLE_BUFFER_SIZE 10
> +
>  static int print_insn (bfd_vma, instr_info *);
>  static void dofloat (instr_info *, int);
>  static void OP_ST (instr_info *, int, int);
>  static void OP_STi (instr_info *, int, int);
>  static int putop (instr_info *, const char *, int);
> -static void oappend (instr_info *, const char *);
> +static void oappend (instr_info *, const char *, enum disassembler_style);

Please add a new function, oappend_with_style, to take a new
argument and change oappend to call oappend_with_style with
dis_style_text.

>  static void append_seg (instr_info *);
>  static void OP_indirE (instr_info *, int, int);
>  static void print_operand_value (instr_info *, char *, int, bfd_vma);
> @@ -166,6 +168,8 @@ struct instr_info
>    char *obufp;
>    char *mnemonicendp;
>    char scratchbuf[100];
> +  char style_buffer[STYLE_BUFFER_SIZE];
> +  char staging_area[100];
>    unsigned char *start_codep;
>    unsigned char *insn_codep;
>    unsigned char *codep;
> @@ -248,6 +252,8 @@ struct instr_info
>
>    enum x86_64_isa isa64;
>
> +  int (*printf) (instr_info *ins, enum disassembler_style style,
> +                const char *fmt, ...) ATTRIBUTE_FPTR_PRINTF_3;
>  };
>
>  /* Mark parts used in the REX prefix.  When we are testing for
> @@ -9300,9 +9306,73 @@ get_sib (instr_info *ins, int sizeflag)
>  /* Like oappend (below), but S is a string starting with '%'.
>     In Intel syntax, the '%' is elided.  */
>  static void
> -oappend_maybe_intel (instr_info *ins, const char *s)
> +oappend_maybe_intel (instr_info *ins, const char *s,
> +                    enum disassembler_style style)

oappend_maybe_intel_wityh_style

>  {
> -  oappend (ins, s + ins->intel_syntax);
> +  oappend (ins, s + ins->intel_syntax, style);
> +}
> +
> +/* Wrap around a call to INS->info->fprintf_styled_func, printing FMT.
> +   STYLE is the default style to use in the fprintf_styled_func calls,
> +   however, FMT might include embedded style markers (see oappend_style),
> +   these embedded markers are not printed, but instead change the style
> +   used in the next fprintf_styled_func call.
> +
> +   Return non-zero to indicate the print call was a success.  */
> +
> +static int ATTRIBUTE_PRINTF_3
> +i386_dis_printf (instr_info *ins, enum disassembler_style style,
> +                const char *fmt, ...)
> +{
> +  va_list ap;
> +  enum disassembler_style curr_style = style;
> +  char *start, *curr;
> +
> +  va_start (ap, fmt);
> +  vsnprintf (ins->staging_area, 100, fmt, ap);
> +  va_end (ap);
> +
> +  start = curr = ins->staging_area;
> +
> +  do
> +    {
> +      if (*curr == '\0' || *curr == '~')
> +       {
> +         /* Output content between our START position and CURR.  */
> +         int len = curr - start;
> +         (*ins->info->fprintf_styled_func) (ins->info->stream, curr_style,
> +                                            "%.*s", len, start);
> +         if (*curr == '\0')
> +           break;
> +
> +         /* Update the CURR_STYLE, it is possible here that if the input
> +            is corrupted in some way, then we may set CURR_STYLE to an
> +            invalid value.  Don't worry though, we check for that in a
> +            subsequent if block.  */
> +         ++curr;
> +         if (*curr >= '0' && *curr <= '9')
> +           curr_style = (enum disassembler_style) (*curr - '0');
> +         else if (*curr >= 'a' && *curr <= 'f')
> +           curr_style = (enum disassembler_style) (*curr - 'a' + 10);
> +         else
> +           curr_style = dis_style_text;
> +
> +         /* Skip over the hex character, and the closing '~'.  Also
> +            validate that CURR_STYLE is set to a valid value.  */
> +         ++curr;
> +         if (*curr != '~' || curr_style > dis_style_comment_start)
> +           curr_style = dis_style_text;
> +         ++curr;
> +
> +         /* Reset the START and CURR pointers to after the style marker.  */
> +         start = curr;
> +       }
> +      else
> +       ++curr;
> +    }
> +  while (true);
> +
> +  return 1;
>  }
>
>  static int
> @@ -9317,6 +9387,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>    struct dis_private priv;
>    int prefix_length;
>
> +  ins->printf = i386_dis_printf;
>    ins->isa64 = 0;
>    ins->intel_mnemonic = !SYSV386_COMPAT;
>    ins->op_is_jump = false;
> @@ -9401,8 +9472,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>
>    if (ins->address_mode == mode_64bit && sizeof (bfd_vma) < 8)
>      {
> -      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
> -                                        _("64-bit address is disabled"));
> +      ins->printf (ins, dis_style_text, _("64-bit address is disabled"));
>        return -1;
>      }
>
> @@ -9451,16 +9521,14 @@ print_insn (bfd_vma pc, instr_info *ins)
>         {
>           name = prefix_name (ins, priv.the_buffer[0], priv.orig_sizeflag);
>           if (name != NULL)
> -           (*ins->info->fprintf_styled_func)
> -             (ins->info->stream, dis_style_mnemonic, "%s", name);
> +           ins->printf (ins, dis_style_mnemonic, "%s", name);
>           else
>             {
>               /* Just print the first byte as a .byte instruction.  */
> -             (*ins->info->fprintf_styled_func)
> -               (ins->info->stream, dis_style_assembler_directive, ".byte ");
> -             (*ins->info->fprintf_styled_func)
> -               (ins->info->stream, dis_style_immediate, "0x%x",
> -                (unsigned int) priv.the_buffer[0]);
> +             ins->printf (ins, dis_style_assembler_directive,
> +                          ".byte ");
> +             ins->printf (ins, dis_style_immediate, "0x%x",
> +                          (unsigned int) priv.the_buffer[0]);
>             }
>
>           return 1;
> @@ -9478,10 +9546,9 @@ print_insn (bfd_vma pc, instr_info *ins)
>        for (i = 0;
>            i < (int) ARRAY_SIZE (ins->all_prefixes) && ins->all_prefixes[i];
>            i++)
> -       (*ins->info->fprintf_styled_func)
> -         (ins->info->stream, dis_style_mnemonic, "%s%s",
> -          (i == 0 ? "" : " "), prefix_name (ins, ins->all_prefixes[i],
> -                                            sizeflag));
> +       ins->printf (ins, dis_style_mnemonic, "%s%s",
> +                    (i == 0 ? "" : " "),
> +                    prefix_name (ins, ins->all_prefixes[i], sizeflag));
>        return i;
>      }
>
> @@ -9496,11 +9563,9 @@ print_insn (bfd_vma pc, instr_info *ins)
>        /* Handle ins->prefixes before fwait.  */
>        for (i = 0; i < ins->fwait_prefix && ins->all_prefixes[i];
>            i++)
> -       (*ins->info->fprintf_styled_func)
> -         (ins->info->stream, dis_style_mnemonic, "%s ",
> -          prefix_name (ins, ins->all_prefixes[i], sizeflag));
> -      (*ins->info->fprintf_styled_func)
> -       (ins->info->stream, dis_style_mnemonic, "fwait");
> +       ins->printf (ins, dis_style_mnemonic, "%s ",
> +                    prefix_name (ins, ins->all_prefixes[i], sizeflag));
> +      ins->printf (ins, dis_style_mnemonic, "fwait");
>        return i + 1;
>      }
>
> @@ -9569,14 +9634,15 @@ print_insn (bfd_vma pc, instr_info *ins)
>                   /* Don't print {%k0}.  */
>                   if (ins->vex.mask_register_specifier)
>                     {
> -                     oappend (ins, "{");
> +                     oappend (ins, "{", dis_style_text);
>                       oappend_maybe_intel (ins,
>                                            att_names_mask
> -                                          [ins->vex.mask_register_specifier]);
> -                     oappend (ins, "}");
> +                                          [ins->vex.mask_register_specifier],
> +                                          dis_style_text);
> +                     oappend (ins, "}", dis_style_text);
>                     }
>                   if (ins->vex.zeroing)
> -                   oappend (ins, "{z}");
> +                   oappend (ins, "{z}", dis_style_text);
>
>                   /* S/G insns require a mask and don't allow
>                      zeroing-masking.  */
> @@ -9584,7 +9650,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>                        || dp->op[0].bytemode == vex_vsib_q_w_dq_mode)
>                       && (ins->vex.mask_register_specifier == 0
>                           || ins->vex.zeroing))
> -                   oappend (ins, "/(bad)");
> +                   oappend (ins, "/(bad)", dis_style_text);
>                 }
>             }
>
> @@ -9598,8 +9664,8 @@ print_insn (bfd_vma pc, instr_info *ins)
>                   ins->obufp = ins->op_out[i];
>                   if (*ins->obufp)
>                     continue;
> -                 oappend (ins, names_rounding[ins->vex.ll]);
> -                 oappend (ins, "bad}");
> +                 oappend (ins, names_rounding[ins->vex.ll], dis_style_text);
> +                 oappend (ins, "bad}", dis_style_text);
>                   break;
>                 }
>             }
> @@ -9649,16 +9715,14 @@ print_insn (bfd_vma pc, instr_info *ins)
>       are all 0s in inverted form.  */
>    if (ins->need_vex && ins->vex.register_specifier != 0)
>      {
> -      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
> -                                        "(bad)");
> +      ins->printf (ins, dis_style_text, "(bad)");
>        return ins->end_codep - priv.the_buffer;
>      }
>
>    /* If EVEX.z is set, there must be an actual mask register in use.  */
>    if (ins->vex.zeroing && ins->vex.mask_register_specifier == 0)
>      {
> -      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
> -                                        "(bad)");
> +      ins->printf (ins, dis_style_text, "(bad)");
>        return ins->end_codep - priv.the_buffer;
>      }
>
> @@ -9669,8 +9733,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>          the encoding invalid.  Most other PREFIX_OPCODE rules still apply.  */
>        if (ins->need_vex ? !ins->vex.prefix : !(ins->prefixes & PREFIX_DATA))
>         {
> -         (*ins->info->fprintf_styled_func) (ins->info->stream,
> -                                            dis_style_text, "(bad)");
> +         ins->printf (ins, dis_style_text, "(bad)");
>           return ins->end_codep - priv.the_buffer;
>         }
>        ins->used_prefixes |= PREFIX_DATA;
> @@ -9697,8 +9760,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>           || (ins->vex.evex && dp->prefix_requirement != PREFIX_DATA
>               && !ins->vex.w != !(ins->used_prefixes & PREFIX_DATA)))
>         {
> -         (*ins->info->fprintf_styled_func) (ins->info->stream,
> -                                            dis_style_text, "(bad)");
> +         ins->printf (ins, dis_style_text, "(bad)");
>           return ins->end_codep - priv.the_buffer;
>         }
>        break;
> @@ -9748,24 +9810,28 @@ print_insn (bfd_vma pc, instr_info *ins)
>         if (name == NULL)
>           abort ();
>         prefix_length += strlen (name) + 1;
> -       (*ins->info->fprintf_styled_func)
> -         (ins->info->stream, dis_style_mnemonic, "%s ", name);
> +       ins->printf (ins, dis_style_mnemonic, "%s ", name);
>        }
>
>    /* Check maximum code length.  */
>    if ((ins->codep - ins->start_codep) > MAX_CODE_LENGTH)
>      {
> -      (*ins->info->fprintf_styled_func)
> -       (ins->info->stream, dis_style_text, "(bad)");
> +      ins->printf (ins, dis_style_text, "(bad)");
>        return MAX_CODE_LENGTH;
>      }
>
> -  ins->obufp = ins->mnemonicendp;
> -  for (i = strlen (ins->obuf) + prefix_length; i < 6; i++)
> -    oappend (ins, " ");
> -  oappend (ins, " ");
> -  (*ins->info->fprintf_styled_func)
> -    (ins->info->stream, dis_style_mnemonic, "%s", ins->obuf);
> +  i = strlen (ins->obuf);
> +  if (ins->mnemonicendp == ins->obuf + i)
> +    {
> +      i += prefix_length;
> +      if (i < 6)
> +       i = 6 - i + 1;
> +      else
> +       i = 1;
> +    }
> +  else
> +    i = 0;
> +  ins->printf (ins, dis_style_mnemonic, "%s%*s", ins->obuf, i, "");
>
>    /* The enter and bound instructions are printed with operands in the same
>       order as the intel book; everything else is printed in reverse order.  */
> @@ -9804,8 +9870,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>      if (*op_txt[i])
>        {
>         if (needcomma)
> -         (*ins->info->fprintf_styled_func) (ins->info->stream,
> -                                            dis_style_text, ",");
> +         ins->printf (ins, dis_style_text, ",");
>         if (ins->op_index[i] != -1 && !ins->op_riprel[i])
>           {
>             bfd_vma target = (bfd_vma) ins->op_address[ins->op_index[i]];
> @@ -9821,18 +9886,14 @@ print_insn (bfd_vma pc, instr_info *ins)
>             (*ins->info->print_address_func) (target, ins->info);
>           }
>         else
> -         (*ins->info->fprintf_styled_func) (ins->info->stream,
> -                                            dis_style_text, "%s",
> -                                            op_txt[i]);
> +         ins->printf (ins, dis_style_text, "%s", op_txt[i]);
>         needcomma = 1;
>        }
>
>    for (i = 0; i < MAX_OPERANDS; i++)
>      if (ins->op_index[i] != -1 && ins->op_riprel[i])
>        {
> -       (*ins->info->fprintf_styled_func) (ins->info->stream,
> -                                          dis_style_comment_start,
> -                                          "        # ");
> +       ins->printf (ins, dis_style_comment_start, "        # ");
>         (*ins->info->print_address_func) ((bfd_vma)
>                         (ins->start_pc + (ins->codep - ins->start_codep)
>                          + ins->op_address[ins->op_index[i]]), ins->info);
> @@ -10217,15 +10278,18 @@ static void
>  OP_ST (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>         int sizeflag ATTRIBUTE_UNUSED)
>  {
> -  oappend_maybe_intel (ins, "%st");
> +  oappend_maybe_intel (ins, "%st", dis_style_text);
>  }
>
>  static void
>  OP_STi (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>         int sizeflag ATTRIBUTE_UNUSED)
>  {
> -  sprintf (ins->scratchbuf, "%%st(%d)", ins->modrm.rm);
> -  oappend_maybe_intel (ins, ins->scratchbuf);
> +  oappend_maybe_intel (ins, "%st", dis_style_text);
> +  oappend (ins, "(", dis_style_text);
> +  sprintf (ins->scratchbuf, "%d", ins->modrm.rm);
> +  oappend (ins, ins->scratchbuf, dis_style_immediate);
> +  oappend (ins, ")", dis_style_text);
>  }
>
>  /* Capital letters in template are macros.  */
> @@ -10329,7 +10393,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
>                 if (!ins->vex.evex || ins->vex.w)
>                   *ins->obufp++ = 'd';
>                 else
> -                 oappend (ins, "{bad}");
> +                 oappend (ins, "{bad}", dis_style_text);
>                 break;
>               default:
>                 abort ();
> @@ -10424,7 +10488,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
>               if (!ins->vex.w)
>                 *ins->obufp++ = 'h';
>               else
> -               oappend (ins, "{bad}");
> +               oappend (ins, "{bad}", dis_style_text);
>             }
>           else
>             abort ();
> @@ -10608,7 +10672,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
>               if (!ins->vex.evex || !ins->vex.w)
>                 *ins->obufp++ = 's';
>               else
> -               oappend (ins, "{bad}");
> +               oappend (ins, "{bad}", dis_style_text);
>               break;
>             default:
>               abort ();
> @@ -10772,12 +10836,47 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
>    return 0;
>  }
>
> +/* Add a style marker "~X~" to *INS->obufp that encodes STYLE.  This
> +   assumes that the buffer pointed to by INS->obufp has space.  In the
> +   style marker 'X' is replaced with a single hex character that represents
> +   STYLE.  */
> +
> +static void
> +oappend_style (instr_info *ins, enum disassembler_style style)
> +{
> +  int num = (int) style;
> +
> +  /* We currently assume that STYLE can be encoded as a single hex
> +     character.  If more styles are added then this might start to fail,
> +     and we'll need to expand this code.  */
> +  if (num > 0xf)
> +    abort ();
> +
> +  *ins->obufp++ = '~';
> +  *ins->obufp++ = (num < 10 ? ('0' + num)
> +                  : ((num < 16) ? ('a' + (num - 10)) : '0'));
> +  *ins->obufp++ = '~';
> +  *ins->obufp = '\0';

Do you need '\0'?

> +}
> +
>  static void
> -oappend (instr_info *ins, const char *s)
> +oappend (instr_info *ins, const char *s, enum disassembler_style style)
>  {
> +  oappend_style (ins, style);
>    ins->obufp = stpcpy (ins->obufp, s);
>  }
>
> +/* Add a single character C to the buffer pointer to by INS->obufp, marking
> +   the style for the character as STYLE.  */
> +
> +static void
> +oappend_char (instr_info *ins, const char c, enum disassembler_style style)
> +{
> +  oappend_style (ins, style);
> +  *ins->obufp++ = c;
> +  *ins->obufp = '\0';
> +}
> +
>  static void
>  append_seg (instr_info *ins)
>  {
> @@ -10789,33 +10888,34 @@ append_seg (instr_info *ins)
>    switch (ins->active_seg_prefix)
>      {
>      case PREFIX_CS:
> -      oappend_maybe_intel (ins, "%cs:");
> +      oappend_maybe_intel (ins, "%cs", dis_style_register);
>        break;
>      case PREFIX_DS:
> -      oappend_maybe_intel (ins, "%ds:");
> +      oappend_maybe_intel (ins, "%ds", dis_style_register);
>        break;
>      case PREFIX_SS:
> -      oappend_maybe_intel (ins, "%ss:");
> +      oappend_maybe_intel (ins, "%ss", dis_style_register);
>        break;
>      case PREFIX_ES:
> -      oappend_maybe_intel (ins, "%es:");
> +      oappend_maybe_intel (ins, "%es", dis_style_register);
>        break;
>      case PREFIX_FS:
> -      oappend_maybe_intel (ins, "%fs:");
> +      oappend_maybe_intel (ins, "%fs", dis_style_register);
>        break;
>      case PREFIX_GS:
> -      oappend_maybe_intel (ins, "%gs:");
> +      oappend_maybe_intel (ins, "%gs", dis_style_register);
>        break;
>      default:
>        break;
>      }
> +  oappend_char (ins, ':', dis_style_text);
>  }
>
>  static void
>  OP_indirE (instr_info *ins, int bytemode, int sizeflag)
>  {
>    if (!ins->intel_syntax)
> -    oappend (ins, "*");
> +    oappend (ins, "*", dis_style_text);
>    OP_E (ins, bytemode, sizeflag);
>  }
>
> @@ -10931,14 +11031,14 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
>           case x_mode:
>           case evex_half_bcst_xmmq_mode:
>             if (ins->vex.w)
> -             oappend (ins, "QWORD PTR ");
> +             oappend (ins, "QWORD PTR ", dis_style_text);
>             else
> -             oappend (ins, "DWORD PTR ");
> +             oappend (ins, "DWORD PTR ", dis_style_text);
>             break;
>           case xh_mode:
>           case evex_half_bcst_xmmqh_mode:
>           case evex_half_bcst_xmmqdh_mode:
> -           oappend (ins, "WORD PTR ");
> +           oappend (ins, "WORD PTR ", dis_style_text);
>             break;
>           default:
>             ins->vex.no_broadcast = true;
> @@ -10951,17 +11051,17 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
>      case b_mode:
>      case b_swap_mode:
>      case db_mode:
> -      oappend (ins, "BYTE PTR ");
> +      oappend (ins, "BYTE PTR ", dis_style_text);
>        break;
>      case w_mode:
>      case w_swap_mode:
>      case dw_mode:
> -      oappend (ins, "WORD PTR ");
> +      oappend (ins, "WORD PTR ", dis_style_text);
>        break;
>      case indir_v_mode:
>        if (ins->address_mode == mode_64bit && ins->isa64 == intel64)
>         {
> -         oappend (ins, "QWORD PTR ");
> +         oappend (ins, "QWORD PTR ", dis_style_text);
>           break;
>         }
>        /* Fall through.  */
> @@ -10969,7 +11069,7 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
>        if (ins->address_mode == mode_64bit && ((sizeflag & DFLAG)
>                                               || (ins->rex & REX_W)))
>         {
> -         oappend (ins, "QWORD PTR ");
> +         oappend (ins, "QWORD PTR ", dis_style_text);
>           break;
>         }
>        /* Fall through.  */
> @@ -10978,62 +11078,62 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
>      case dq_mode:
>        USED_REX (REX_W);
>        if (ins->rex & REX_W)
> -       oappend (ins, "QWORD PTR ");
> +       oappend (ins, "QWORD PTR ", dis_style_text);
>        else if (bytemode == dq_mode)
> -       oappend (ins, "DWORD PTR ");
> +       oappend (ins, "DWORD PTR ", dis_style_text);
>        else
>         {
>           if (sizeflag & DFLAG)
> -           oappend (ins, "DWORD PTR ");
> +           oappend (ins, "DWORD PTR ", dis_style_text);
>           else
> -           oappend (ins, "WORD PTR ");
> +           oappend (ins, "WORD PTR ", dis_style_text);
>           ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
>         }
>        break;
>      case z_mode:
>        if ((ins->rex & REX_W) || (sizeflag & DFLAG))
>         *ins->obufp++ = 'D';
> -      oappend (ins, "WORD PTR ");
> +      oappend (ins, "WORD PTR ", dis_style_text);
>        if (!(ins->rex & REX_W))
>         ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
>        break;
>      case a_mode:
>        if (sizeflag & DFLAG)
> -       oappend (ins, "QWORD PTR ");
> +       oappend (ins, "QWORD PTR ", dis_style_text);
>        else
> -       oappend (ins, "DWORD PTR ");
> +       oappend (ins, "DWORD PTR ", dis_style_text);
>        ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
>        break;
>      case movsxd_mode:
>        if (!(sizeflag & DFLAG) && ins->isa64 == intel64)
> -       oappend (ins, "WORD PTR ");
> +       oappend (ins, "WORD PTR ", dis_style_text);
>        else
> -       oappend (ins, "DWORD PTR ");
> +       oappend (ins, "DWORD PTR ", dis_style_text);
>        ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
>        break;
>      case d_mode:
>      case d_swap_mode:
> -      oappend (ins, "DWORD PTR ");
> +      oappend (ins, "DWORD PTR ", dis_style_text);
>        break;
>      case q_mode:
>      case q_swap_mode:
> -      oappend (ins, "QWORD PTR ");
> +      oappend (ins, "QWORD PTR ", dis_style_text);
>        break;
>      case m_mode:
>        if (ins->address_mode == mode_64bit)
> -       oappend (ins, "QWORD PTR ");
> +       oappend (ins, "QWORD PTR ", dis_style_text);
>        else
> -       oappend (ins, "DWORD PTR ");
> +       oappend (ins, "DWORD PTR ", dis_style_text);
>        break;
>      case f_mode:
>        if (sizeflag & DFLAG)
> -       oappend (ins, "FWORD PTR ");
> +       oappend (ins, "FWORD PTR ", dis_style_text);
>        else
> -       oappend (ins, "DWORD PTR ");
> +       oappend (ins, "DWORD PTR ", dis_style_text);
>        ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
>        break;
>      case t_mode:
> -      oappend (ins, "TBYTE PTR ");
> +      oappend (ins, "TBYTE PTR ", dis_style_text);
>        break;
>      case x_mode:
>      case xh_mode:
> @@ -11046,26 +11146,26 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
>           switch (ins->vex.length)
>             {
>             case 128:
> -             oappend (ins, "XMMWORD PTR ");
> +             oappend (ins, "XMMWORD PTR ", dis_style_text);
>               break;
>             case 256:
> -             oappend (ins, "YMMWORD PTR ");
> +             oappend (ins, "YMMWORD PTR ", dis_style_text);
>               break;
>             case 512:
> -             oappend (ins, "ZMMWORD PTR ");
> +             oappend (ins, "ZMMWORD PTR ", dis_style_text);
>               break;
>             default:
>               abort ();
>             }
>         }
>        else
> -       oappend (ins, "XMMWORD PTR ");
> +       oappend (ins, "XMMWORD PTR ", dis_style_text);
>        break;
>      case xmm_mode:
> -      oappend (ins, "XMMWORD PTR ");
> +      oappend (ins, "XMMWORD PTR ", dis_style_text);
>        break;
>      case ymm_mode:
> -      oappend (ins, "YMMWORD PTR ");
> +      oappend (ins, "YMMWORD PTR ", dis_style_text);
>        break;
>      case xmmq_mode:
>      case evex_half_bcst_xmmqh_mode:
> @@ -11076,13 +11176,13 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
>        switch (ins->vex.length)
>         {
>         case 128:
> -         oappend (ins, "QWORD PTR ");
> +         oappend (ins, "QWORD PTR ", dis_style_text);
>           break;
>         case 256:
> -         oappend (ins, "XMMWORD PTR ");
> +         oappend (ins, "XMMWORD PTR ", dis_style_text);
>           break;
>         case 512:
> -         oappend (ins, "YMMWORD PTR ");
> +         oappend (ins, "YMMWORD PTR ", dis_style_text);
>           break;
>         default:
>           abort ();
> @@ -11095,13 +11195,13 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
>        switch (ins->vex.length)
>         {
>         case 128:
> -         oappend (ins, "WORD PTR ");
> +         oappend (ins, "WORD PTR ", dis_style_text);
>           break;
>         case 256:
> -         oappend (ins, "DWORD PTR ");
> +         oappend (ins, "DWORD PTR ", dis_style_text);
>           break;
>         case 512:
> -         oappend (ins, "QWORD PTR ");
> +         oappend (ins, "QWORD PTR ", dis_style_text);
>           break;
>         default:
>           abort ();
> @@ -11115,13 +11215,13 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
>        switch (ins->vex.length)
>         {
>         case 128:
> -         oappend (ins, "DWORD PTR ");
> +         oappend (ins, "DWORD PTR ", dis_style_text);
>           break;
>         case 256:
> -         oappend (ins, "QWORD PTR ");
> +         oappend (ins, "QWORD PTR ", dis_style_text);
>           break;
>         case 512:
> -         oappend (ins, "XMMWORD PTR ");
> +         oappend (ins, "XMMWORD PTR ", dis_style_text);
>           break;
>         default:
>           abort ();
> @@ -11134,45 +11234,45 @@ intel_operand_size (instr_info *ins, int bytemode, int sizeflag)
>        switch (ins->vex.length)
>         {
>         case 128:
> -         oappend (ins, "QWORD PTR ");
> +         oappend (ins, "QWORD PTR ", dis_style_text);
>           break;
>         case 256:
> -         oappend (ins, "YMMWORD PTR ");
> +         oappend (ins, "YMMWORD PTR ", dis_style_text);
>           break;
>         case 512:
> -         oappend (ins, "ZMMWORD PTR ");
> +         oappend (ins, "ZMMWORD PTR ", dis_style_text);
>           break;
>         default:
>           abort ();
>         }
>        break;
>      case o_mode:
> -      oappend (ins, "OWORD PTR ");
> +      oappend (ins, "OWORD PTR ", dis_style_text);
>        break;
>      case vex_vsib_d_w_dq_mode:
>      case vex_vsib_q_w_dq_mode:
>        if (!ins->need_vex)
>         abort ();
>        if (ins->vex.w)
> -       oappend (ins, "QWORD PTR ");
> +       oappend (ins, "QWORD PTR ", dis_style_text);
>        else
> -       oappend (ins, "DWORD PTR ");
> +       oappend (ins, "DWORD PTR ", dis_style_text);
>        break;
>      case mask_bd_mode:
>        if (!ins->need_vex || ins->vex.length != 128)
>         abort ();
>        if (ins->vex.w)
> -       oappend (ins, "DWORD PTR ");
> +       oappend (ins, "DWORD PTR ", dis_style_text);
>        else
> -       oappend (ins, "BYTE PTR ");
> +       oappend (ins, "BYTE PTR ", dis_style_text);
>        break;
>      case mask_mode:
>        if (!ins->need_vex)
>         abort ();
>        if (ins->vex.w)
> -       oappend (ins, "QWORD PTR ");
> +       oappend (ins, "QWORD PTR ", dis_style_text);
>        else
> -       oappend (ins, "WORD PTR ");
> +       oappend (ins, "WORD PTR ", dis_style_text);
>        break;
>      case v_bnd_mode:
>      case v_bndmk_mode:
> @@ -11221,7 +11321,7 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
>      case bnd_swap_mode:
>        if (reg > 0x3)
>         {
> -         oappend (ins, "(bad)");
> +         oappend (ins, "(bad)", dis_style_text);
>           return;
>         }
>        names = att_names_bnd;
> @@ -11285,7 +11385,7 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
>      case mask_mode:
>        if (reg > 0x7)
>         {
> -         oappend (ins, "(bad)");
> +         oappend (ins, "(bad)", dis_style_text);
>           return;
>         }
>        names = att_names_mask;
> @@ -11293,10 +11393,10 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
>      case 0:
>        return;
>      default:
> -      oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
> +      oappend (ins, INTERNAL_DISASSEMBLER_ERROR, dis_style_text);
>        return;
>      }
> -  oappend_maybe_intel (ins, names[reg]);
> +  oappend_maybe_intel (ins, names[reg], dis_style_register);
>  }
>
>  static void
> @@ -11488,7 +11588,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>               || bytemode == vex_vsib_q_w_dq_mode
>               || bytemode == vex_sibmem_mode)
>             {
> -             oappend (ins, "(bad)");
> +             oappend (ins, "(bad)", dis_style_text);
>               return;
>             }
>         }
> @@ -11505,7 +11605,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>               disp = get32s (ins);
>               if (riprel && bytemode == v_bndmk_mode)
>                 {
> -                 oappend (ins, "(bad)");
> +                 oappend (ins, "(bad)", dis_style_text);
>                   return;
>                 }
>             }
> @@ -11560,11 +11660,14 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>               print_displacement (ins, ins->scratchbuf, disp);
>             else
>               print_operand_value (ins, ins->scratchbuf, 1, disp);
> -           oappend (ins, ins->scratchbuf);
> +           oappend (ins, ins->scratchbuf, dis_style_address_offset);
>             if (riprel)
>               {
>                 set_op (ins, disp, 1);
> -               oappend (ins, !addr32flag ? "(%rip)" : "(%eip)");
> +               oappend_char (ins, '(', dis_style_text);
> +               oappend (ins, !addr32flag ? "%rip" : "%eip",
> +                        dis_style_register);
> +               oappend_char (ins, ')', dis_style_text);
>               }
>           }
>
> @@ -11578,17 +11681,17 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>
>        if (havedisp || (ins->intel_syntax && riprel))
>         {
> -         *ins->obufp++ = ins->open_char;
> +         oappend_char (ins, ins->open_char, dis_style_text);
>           if (ins->intel_syntax && riprel)
>             {
>               set_op (ins, disp, 1);
> -             oappend (ins, !addr32flag ? "rip" : "eip");
> +             oappend (ins, !addr32flag ? "rip" : "eip", dis_style_register);
>             }
> -         *ins->obufp = '\0';
>           if (havebase)
>             oappend_maybe_intel (ins,
>                                  (ins->address_mode == mode_64bit && !addr32flag
> -                                 ? att_names64 : att_names32)[rbase]);
> +                                 ? att_names64 : att_names32)[rbase],
> +                                dis_style_register);
>           if (ins->has_sib)
>             {
>               /* ESP/RSP won't allow index.  If base isn't ESP/RSP,
> @@ -11599,41 +11702,34 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>                   || (havebase && base != ESP_REG_NUM))
>                 {
>                   if (!ins->intel_syntax || havebase)
> -                   {
> -                     *ins->obufp++ = ins->separator_char;
> -                     *ins->obufp = '\0';
> -                   }
> +                   oappend_char (ins, ins->separator_char, dis_style_text);
>                   if (indexes)
>                     {
>                       if (ins->address_mode == mode_64bit || vindex < 16)
> -                       oappend_maybe_intel (ins, indexes[vindex]);
> +                       oappend_maybe_intel (ins, indexes[vindex],
> +                                            dis_style_register);
>                       else
> -                       oappend (ins, "(bad)");
> +                       oappend (ins, "(bad)", dis_style_text);
>                     }
>                   else
>                     oappend_maybe_intel (ins,
>                                          ins->address_mode == mode_64bit
>                                          && !addr32flag ? att_index64
> -                                                       : att_index32);
> +                                        : att_index32, dis_style_text);
>
> -                 *ins->obufp++ = ins->scale_char;
> -                 *ins->obufp = '\0';
> +                 oappend_char (ins, ins->scale_char, dis_style_text);
>                   sprintf (ins->scratchbuf, "%d", 1 << scale);
> -                 oappend (ins, ins->scratchbuf);
> +                 oappend (ins, ins->scratchbuf, dis_style_immediate);
>                 }
>             }
>           if (ins->intel_syntax
>               && (disp || ins->modrm.mod != 0 || base == 5))
>             {
>               if (!havedisp || (bfd_signed_vma) disp >= 0)
> -               {
> -                 *ins->obufp++ = '+';
> -                 *ins->obufp = '\0';
> -               }
> +                 oappend_char (ins, '+', dis_style_text);
>               else if (ins->modrm.mod != 1 && disp != -disp)
>                 {
> -                 *ins->obufp++ = '-';
> -                 *ins->obufp = '\0';
> +                 oappend_char (ins, '-', dis_style_text);
>                   disp = - (bfd_signed_vma) disp;
>                 }
>
> @@ -11641,11 +11737,10 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>                 print_displacement (ins, ins->scratchbuf, disp);
>               else
>                 print_operand_value (ins, ins->scratchbuf, 1, disp);
> -             oappend (ins, ins->scratchbuf);
> +             oappend (ins, ins->scratchbuf, dis_style_text);
>             }
>
> -         *ins->obufp++ = ins->close_char;
> -         *ins->obufp = '\0';
> +         oappend_char (ins, ins->close_char, dis_style_text);
>
>           if (check_gather)
>             {
> @@ -11657,7 +11752,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>               if (!ins->vex.r)
>                 modrm_reg += 16;
>               if (vindex == modrm_reg)
> -               oappend (ins, "/(bad)");
> +               oappend (ins, "/(bad)", dis_style_text);
>             }
>         }
>        else if (ins->intel_syntax)
> @@ -11666,11 +11761,12 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>             {
>               if (!ins->active_seg_prefix)
>                 {
> -                 oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
> -                 oappend (ins, ":");
> +                 oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg],
> +                                      dis_style_text);
> +                 oappend (ins, ":", dis_style_text);
>                 }
>               print_operand_value (ins, ins->scratchbuf, 1, disp);
> -             oappend (ins, ins->scratchbuf);
> +             oappend (ins, ins->scratchbuf, dis_style_text);
>             }
>         }
>      }
> @@ -11681,7 +11777,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>            || bytemode == vex_vsib_d_w_dq_mode
>            || bytemode == vex_vsib_q_w_dq_mode)
>      {
> -      oappend (ins, "(bad)");
> +      oappend (ins, "(bad)", dis_style_text);
>        return;
>      }
>    else
> @@ -11717,47 +11813,42 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>         if (ins->modrm.mod != 0 || ins->modrm.rm == 6)
>           {
>             print_displacement (ins, ins->scratchbuf, disp);
> -           oappend (ins, ins->scratchbuf);
> +           oappend (ins, ins->scratchbuf, dis_style_text);
>           }
>
>        if (ins->modrm.mod != 0 || ins->modrm.rm != 6)
>         {
> -         *ins->obufp++ = ins->open_char;
> -         *ins->obufp = '\0';
> +         oappend_char (ins, ins->open_char, dis_style_text);
>           oappend (ins,
>                    (ins->intel_syntax ? intel_index16
> -                                     : att_index16)[ins->modrm.rm]);
> +                   : att_index16)[ins->modrm.rm], dis_style_text);
>           if (ins->intel_syntax
>               && (disp || ins->modrm.mod != 0 || ins->modrm.rm == 6))
>             {
>               if ((bfd_signed_vma) disp >= 0)
> -               {
> -                 *ins->obufp++ = '+';
> -                 *ins->obufp = '\0';
> -               }
> +               oappend_char (ins, '+', dis_style_text);
>               else if (ins->modrm.mod != 1)
>                 {
> -                 *ins->obufp++ = '-';
> -                 *ins->obufp = '\0';
> +                 oappend_char (ins, '-', dis_style_text);
>                   disp = - (bfd_signed_vma) disp;
>                 }
>
>               print_displacement (ins, ins->scratchbuf, disp);
> -             oappend (ins, ins->scratchbuf);
> +             oappend (ins, ins->scratchbuf, dis_style_text);
>             }
>
> -         *ins->obufp++ = ins->close_char;
> -         *ins->obufp = '\0';
> +         oappend_char (ins, ins->close_char, dis_style_text);
>         }
>        else if (ins->intel_syntax)
>         {
>           if (!ins->active_seg_prefix)
>             {
> -             oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
> -             oappend (ins, ":");
> +             oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg],
> +                                  dis_style_text);
> +             oappend (ins, ":", dis_style_text);
>             }
>           print_operand_value (ins, ins->scratchbuf, 1, disp & 0xffff);
> -         oappend (ins, ins->scratchbuf);
> +         oappend (ins, ins->scratchbuf, dis_style_text);
>         }
>      }
>    if (ins->vex.b)
> @@ -11773,19 +11864,19 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>           if (bytemode == xh_mode)
>             {
>               if (ins->vex.w)
> -               oappend (ins, "{bad}");
> +               oappend (ins, "{bad}", dis_style_text);
>               else
>                 {
>                   switch (ins->vex.length)
>                     {
>                     case 128:
> -                     oappend (ins, "{1to8}");
> +                     oappend (ins, "{1to8}", dis_style_text);
>                       break;
>                     case 256:
> -                     oappend (ins, "{1to16}");
> +                     oappend (ins, "{1to16}", dis_style_text);
>                       break;
>                     case 512:
> -                     oappend (ins, "{1to32}");
> +                     oappend (ins, "{1to32}", dis_style_text);
>                       break;
>                     default:
>                       abort ();
> @@ -11802,13 +11893,13 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>               switch (ins->vex.length)
>                 {
>                 case 128:
> -                 oappend (ins, "{1to2}");
> +                 oappend (ins, "{1to2}", dis_style_text);
>                   break;
>                 case 256:
> -                 oappend (ins, "{1to4}");
> +                 oappend (ins, "{1to4}", dis_style_text);
>                   break;
>                 case 512:
> -                 oappend (ins, "{1to8}");
> +                 oappend (ins, "{1to8}", dis_style_text);
>                   break;
>                 default:
>                   abort ();
> @@ -11820,13 +11911,13 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>               switch (ins->vex.length)
>                 {
>                 case 128:
> -                 oappend (ins, "{1to4}");
> +                 oappend (ins, "{1to4}", dis_style_text);
>                   break;
>                 case 256:
> -                 oappend (ins, "{1to8}");
> +                 oappend (ins, "{1to8}", dis_style_text);
>                   break;
>                 case 512:
> -                 oappend (ins, "{1to16}");
> +                 oappend (ins, "{1to16}", dis_style_text);
>                   break;
>                 default:
>                   abort ();
> @@ -11836,7 +11927,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>             ins->vex.no_broadcast = true;
>         }
>        if (ins->vex.no_broadcast)
> -       oappend (ins, "{bad}");
> +       oappend (ins, "{bad}", dis_style_text);
>      }
>  }
>
> @@ -11866,7 +11957,7 @@ OP_G (instr_info *ins, int bytemode, int sizeflag)
>  {
>    if (ins->vex.evex && !ins->vex.r && ins->address_mode == mode_64bit)
>      {
> -      oappend (ins, "(bad)");
> +      oappend (ins, "(bad)", dis_style_text);
>        return;
>      }
>
> @@ -11969,7 +12060,7 @@ OP_REG (instr_info *ins, int code, int sizeflag)
>      {
>      case es_reg: case ss_reg: case cs_reg:
>      case ds_reg: case fs_reg: case gs_reg:
> -      oappend_maybe_intel (ins, att_names_seg[code - es_reg]);
> +      oappend_maybe_intel (ins, att_names_seg[code - es_reg], dis_style_text);
>        return;
>      }
>
> @@ -12019,10 +12110,10 @@ OP_REG (instr_info *ins, int code, int sizeflag)
>         }
>        break;
>      default:
> -      oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
> +      oappend (ins, INTERNAL_DISASSEMBLER_ERROR, dis_style_text);
>        return;
>      }
> -  oappend_maybe_intel (ins, s);
> +  oappend_maybe_intel (ins, s, dis_style_register);
>  }
>
>  static void
> @@ -12035,7 +12126,7 @@ OP_IMREG (instr_info *ins, int code, int sizeflag)
>      case indir_dx_reg:
>        if (!ins->intel_syntax)
>         {
> -         oappend (ins, "(%dx)");
> +         oappend (ins, "(%dx)", dis_style_text);
>           return;
>         }
>        s = att_names16[dx_reg - ax_reg];
> @@ -12060,10 +12151,10 @@ OP_IMREG (instr_info *ins, int code, int sizeflag)
>         ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
>        break;
>      default:
> -      oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
> +      oappend (ins, INTERNAL_DISASSEMBLER_ERROR, dis_style_text);
>        return;
>      }
> -  oappend_maybe_intel (ins, s);
> +  oappend_maybe_intel (ins, s, dis_style_register);
>  }
>
>  static void
> @@ -12108,17 +12199,17 @@ OP_I (instr_info *ins, int bytemode, int sizeflag)
>        break;
>      case const_1_mode:
>        if (ins->intel_syntax)
> -       oappend (ins, "1");
> +       oappend (ins, "1", dis_style_text);
>        return;
>      default:
> -      oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
> +      oappend (ins, INTERNAL_DISASSEMBLER_ERROR, dis_style_text);
>        return;
>      }
>
>    op &= mask;
>    ins->scratchbuf[0] = '$';
>    print_operand_value (ins, ins->scratchbuf + 1, 1, op);
> -  oappend_maybe_intel (ins, ins->scratchbuf);
> +  oappend_maybe_intel (ins, ins->scratchbuf, dis_style_immediate);
>    ins->scratchbuf[0] = '\0';
>  }
>
> @@ -12136,7 +12227,7 @@ OP_I64 (instr_info *ins, int bytemode, int sizeflag)
>
>    ins->scratchbuf[0] = '$';
>    print_operand_value (ins, ins->scratchbuf + 1, 1, get64 (ins));
> -  oappend_maybe_intel (ins, ins->scratchbuf);
> +  oappend_maybe_intel (ins, ins->scratchbuf, dis_style_immediate);
>    ins->scratchbuf[0] = '\0';
>  }
>
> @@ -12184,13 +12275,13 @@ OP_sI (instr_info *ins, int bytemode, int sizeflag)
>         op = get16 (ins);
>        break;
>      default:
> -      oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
> +      oappend (ins, INTERNAL_DISASSEMBLER_ERROR, dis_style_text);
>        return;
>      }
>
>    ins->scratchbuf[0] = '$';
>    print_operand_value (ins, ins->scratchbuf + 1, 1, op);
> -  oappend_maybe_intel (ins, ins->scratchbuf);
> +  oappend_maybe_intel (ins, ins->scratchbuf, dis_style_immediate);
>  }
>
>  static void
> @@ -12234,21 +12325,21 @@ OP_J (instr_info *ins, int bytemode, int sizeflag)
>         ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
>        break;
>      default:
> -      oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
> +      oappend (ins, INTERNAL_DISASSEMBLER_ERROR, dis_style_text);
>        return;
>      }
>    disp = ((ins->start_pc + (ins->codep - ins->start_codep) + disp) & mask)
>          | segment;
>    set_op (ins, disp, 0);
>    print_operand_value (ins, ins->scratchbuf, 1, disp);
> -  oappend (ins, ins->scratchbuf);
> +  oappend (ins, ins->scratchbuf, dis_style_text);
>  }
>
>  static void
>  OP_SEG (instr_info *ins, int bytemode, int sizeflag)
>  {
>    if (bytemode == w_mode)
> -    oappend_maybe_intel (ins, att_names_seg[ins->modrm.reg]);
> +    oappend_maybe_intel (ins, att_names_seg[ins->modrm.reg], dis_style_text);
>    else
>      OP_E (ins, ins->modrm.mod == 3 ? bytemode : w_mode, sizeflag);
>  }
> @@ -12273,7 +12364,7 @@ OP_DIR (instr_info *ins, int dummy ATTRIBUTE_UNUSED, int sizeflag)
>      sprintf (ins->scratchbuf, "0x%x:0x%x", seg, offset);
>    else
>      sprintf (ins->scratchbuf, "$0x%x,$0x%x", seg, offset);
> -  oappend (ins, ins->scratchbuf);
> +  oappend (ins, ins->scratchbuf, dis_style_text);
>  }
>
>  static void
> @@ -12294,12 +12385,13 @@ OP_OFF (instr_info *ins, int bytemode, int sizeflag)
>      {
>        if (!ins->active_seg_prefix)
>         {
> -         oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
> -         oappend (ins, ":");
> +         oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg],
> +                              dis_style_register);
> +         oappend (ins, ":", dis_style_text);
>         }
>      }
>    print_operand_value (ins, ins->scratchbuf, 1, off);
> -  oappend (ins, ins->scratchbuf);
> +  oappend (ins, ins->scratchbuf, dis_style_address_offset);
>  }
>
>  static void
> @@ -12324,12 +12416,13 @@ OP_OFF64 (instr_info *ins, int bytemode, int sizeflag)
>      {
>        if (!ins->active_seg_prefix)
>         {
> -         oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
> -         oappend (ins, ":");
> +         oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg],
> +                              dis_style_text);
> +         oappend (ins, ":", dis_style_text);
>         }
>      }
>    print_operand_value (ins, ins->scratchbuf, 1, off);
> -  oappend (ins, ins->scratchbuf);
> +  oappend (ins, ins->scratchbuf, dis_style_address_offset);
>  }
>
>  static void
> @@ -12350,9 +12443,8 @@ ptr_reg (instr_info *ins, int code, int sizeflag)
>      s = att_names32[code - eAX_reg];
>    else
>      s = att_names16[code - eAX_reg];
> -  oappend_maybe_intel (ins, s);
> -  *ins->obufp++ = ins->close_char;
> -  *ins->obufp = 0;
> +  oappend_maybe_intel (ins, s, dis_style_register);
> +  oappend_char (ins, ins->close_char, dis_style_text);
>  }
>
>  static void
> @@ -12375,7 +12467,8 @@ OP_ESreg (instr_info *ins, int code, int sizeflag)
>           intel_operand_size (ins, b_mode, sizeflag);
>         }
>      }
> -  oappend_maybe_intel (ins, "%es:");
> +  oappend_maybe_intel (ins, "%es", dis_style_register);
> +  oappend_char (ins, ':', dis_style_text);
>    ptr_reg (ins, code, sizeflag);
>  }
>
> @@ -12425,7 +12518,7 @@ OP_C (instr_info *ins, int dummy ATTRIBUTE_UNUSED,
>    else
>      add = 0;
>    sprintf (ins->scratchbuf, "%%cr%d", ins->modrm.reg + add);
> -  oappend_maybe_intel (ins, ins->scratchbuf);
> +  oappend_maybe_intel (ins, ins->scratchbuf, dis_style_text);
>  }
>
>  static void
> @@ -12442,7 +12535,7 @@ OP_D (instr_info *ins, int dummy ATTRIBUTE_UNUSED,
>      sprintf (ins->scratchbuf, "dr%d", ins->modrm.reg + add);
>    else
>      sprintf (ins->scratchbuf, "%%db%d", ins->modrm.reg + add);
> -  oappend (ins, ins->scratchbuf);
> +  oappend (ins, ins->scratchbuf, dis_style_text);
>  }
>
>  static void
> @@ -12450,7 +12543,7 @@ OP_T (instr_info *ins, int dummy ATTRIBUTE_UNUSED,
>        int sizeflag ATTRIBUTE_UNUSED)
>  {
>    sprintf (ins->scratchbuf, "%%tr%d", ins->modrm.reg);
> -  oappend_maybe_intel (ins, ins->scratchbuf);
> +  oappend_maybe_intel (ins, ins->scratchbuf, dis_style_text);
>  }
>
>  static void
> @@ -12470,7 +12563,7 @@ OP_MMX (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>      }
>    else
>      names = att_names_mm;
> -  oappend_maybe_intel (ins, names[reg]);
> +  oappend_maybe_intel (ins, names[reg], dis_style_register);
>  }
>
>  static void
> @@ -12501,7 +12594,7 @@ print_vector_reg (instr_info *ins, unsigned int reg, int bytemode)
>      {
>        if (reg >= 8)
>         {
> -         oappend (ins, "(bad)");
> +         oappend (ins, "(bad)", dis_style_text);
>           return;
>         }
>        names = att_names_tmm;
> @@ -12543,7 +12636,7 @@ print_vector_reg (instr_info *ins, unsigned int reg, int bytemode)
>      }
>    else
>      names = att_names_xmm;
> -  oappend_maybe_intel (ins, names[reg]);
> +  oappend_maybe_intel (ins, names[reg], dis_style_register);
>  }
>
>  static void
> @@ -12603,7 +12696,7 @@ OP_EM (instr_info *ins, int bytemode, int sizeflag)
>      }
>    else
>      names = att_names_mm;
> -  oappend_maybe_intel (ins, names[reg]);
> +  oappend_maybe_intel (ins, names[reg], dis_style_register);
>  }
>
>  /* cvt* are the only instructions in sse2 which have
> @@ -12629,7 +12722,7 @@ OP_EMC (instr_info *ins, int bytemode, int sizeflag)
>    MODRM_CHECK;
>    ins->codep++;
>    ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
> -  oappend_maybe_intel (ins, att_names_mm[ins->modrm.rm]);
> +  oappend_maybe_intel (ins, att_names_mm[ins->modrm.rm], dis_style_register);
>  }
>
>  static void
> @@ -12637,7 +12730,7 @@ OP_MXC (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>         int sizeflag ATTRIBUTE_UNUSED)
>  {
>    ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
> -  oappend_maybe_intel (ins, att_names_mm[ins->modrm.reg]);
> +  oappend_maybe_intel (ins, att_names_mm[ins->modrm.reg], dis_style_text);
>  }
>
>  static void
> @@ -12813,7 +12906,7 @@ OP_3DNowSuffix (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>    ins->obufp = ins->mnemonicendp;
>    mnemonic = Suffix3DNow[*ins->codep++ & 0xff];
>    if (mnemonic)
> -    oappend (ins, mnemonic);
> +    ins->obufp = stpcpy (ins->obufp, mnemonic);
>    else
>      {
>        /* Since a variable sized ins->modrm/ins->sib chunk is between the start
> @@ -12902,7 +12995,7 @@ CMP_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>        /* We have a reserved extension byte.  Output it directly.  */
>        ins->scratchbuf[0] = '$';
>        print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
> -      oappend_maybe_intel (ins, ins->scratchbuf);
> +      oappend_maybe_intel (ins, ins->scratchbuf, dis_style_text);
>        ins->scratchbuf[0] = '\0';
>      }
>  }
> @@ -12959,7 +13052,7 @@ BadOp (instr_info *ins)
>  {
>    /* Throw away prefixes and 1st. opcode byte.  */
>    ins->codep = ins->insn_codep + 1;
> -  oappend (ins, "(bad)");
> +  ins->obufp = stpcpy (ins->obufp, "(bad)");
>  }
>
>  static void
> @@ -13123,7 +13216,7 @@ XMM_Fixup (instr_info *ins, int reg, int sizeflag ATTRIBUTE_UNUSED)
>           abort ();
>         }
>      }
> -  oappend_maybe_intel (ins, names[reg]);
> +  oappend_maybe_intel (ins, names[reg], dis_style_text);
>  }
>
>  static void
> @@ -13160,7 +13253,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>      {
>        if (ins->vex.evex && !ins->vex.v)
>         {
> -         oappend (ins, "(bad)");
> +         oappend (ins, "(bad)", dis_style_text);
>           return;
>         }
>
> @@ -13172,7 +13265,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>    switch (bytemode)
>      {
>      case scalar_mode:
> -      oappend_maybe_intel (ins, att_names_xmm[reg]);
> +      oappend_maybe_intel (ins, att_names_xmm[reg], dis_style_text);
>        return;
>
>      case vex_vsib_d_w_dq_mode:
> @@ -13183,9 +13276,9 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>        if (ins->vex.length == 128
>           || (bytemode != vex_vsib_d_w_dq_mode
>               && !ins->vex.w))
> -       oappend_maybe_intel (ins, att_names_xmm[reg]);
> +       oappend_maybe_intel (ins, att_names_xmm[reg], dis_style_text);
>        else
> -       oappend_maybe_intel (ins, att_names_ymm[reg]);
> +       oappend_maybe_intel (ins, att_names_ymm[reg], dis_style_text);
>
>        /* All 3 XMM/YMM registers must be distinct.  */
>        modrm_reg = ins->modrm.reg;
> @@ -13211,13 +13304,13 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>      case tmm_mode:
>        /* All 3 TMM registers must be distinct.  */
>        if (reg >= 8)
> -       oappend (ins, "(bad)");
> +       oappend (ins, "(bad)", dis_style_text);
>        else
>         {
>           /* This must be the 3rd operand.  */
>           if (ins->obufp != ins->op_out[2])
>             abort ();
> -         oappend_maybe_intel (ins, att_names_tmm[reg]);
> +         oappend_maybe_intel (ins, att_names_tmm[reg], dis_style_text);
>           if (reg == ins->modrm.reg || reg == ins->modrm.rm)
>             strcpy (ins->obufp, "/(bad)");
>         }
> @@ -13254,7 +13347,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>         case mask_mode:
>           if (reg > 0x7)
>             {
> -             oappend (ins, "(bad)");
> +             oappend (ins, "(bad)", dis_style_text);
>               return;
>             }
>           names = att_names_mask;
> @@ -13274,14 +13367,14 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>         case mask_mode:
>           if (reg > 0x7)
>             {
> -             oappend (ins, "(bad)");
> +             oappend (ins, "(bad)", dis_style_text);
>               return;
>             }
>           names = att_names_mask;
>           break;
>         default:
>           /* See PR binutils/20893 for a reproducer.  */
> -         oappend (ins, "(bad)");
> +         oappend (ins, "(bad)", dis_style_text);
>           return;
>         }
>        break;
> @@ -13292,7 +13385,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>        abort ();
>        break;
>      }
> -  oappend_maybe_intel (ins, names[reg]);
> +  oappend_maybe_intel (ins, names[reg], dis_style_register);
>  }
>
>  static void
> @@ -13335,7 +13428,7 @@ OP_REG_VexI4 (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>    if (bytemode == x_mode && ins->vex.length == 256)
>      names = att_names_ymm;
>
> -  oappend_maybe_intel (ins, names[reg]);
> +  oappend_maybe_intel (ins, names[reg], dis_style_text);
>
>    if (ins->vex.w)
>      {
> @@ -13352,7 +13445,7 @@ OP_VexI4 (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>  {
>    ins->scratchbuf[0] = '$';
>    print_operand_value (ins, ins->scratchbuf + 1, 1, ins->codep[-1] & 0xf);
> -  oappend_maybe_intel (ins, ins->scratchbuf);
> +  oappend_maybe_intel (ins, ins->scratchbuf, dis_style_text);
>  }
>
>  static void
> @@ -13397,7 +13490,7 @@ VPCMP_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>        /* We have a reserved extension byte.  Output it directly.  */
>        ins->scratchbuf[0] = '$';
>        print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
> -      oappend_maybe_intel (ins, ins->scratchbuf);
> +      oappend_maybe_intel (ins, ins->scratchbuf, dis_style_text);
>        ins->scratchbuf[0] = '\0';
>      }
>  }
> @@ -13449,7 +13542,7 @@ VPCOM_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>        /* We have a reserved extension byte.  Output it directly.  */
>        ins->scratchbuf[0] = '$';
>        print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
> -      oappend_maybe_intel (ins, ins->scratchbuf);
> +      oappend_maybe_intel (ins, ins->scratchbuf, dis_style_text);
>        ins->scratchbuf[0] = '\0';
>      }
>  }
> @@ -13497,7 +13590,7 @@ PCLMUL_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>        /* We have a reserved extension byte.  Output it directly.  */
>        ins->scratchbuf[0] = '$';
>        print_operand_value (ins, ins->scratchbuf + 1, 1, pclmul_type);
> -      oappend_maybe_intel (ins, ins->scratchbuf);
> +      oappend_maybe_intel (ins, ins->scratchbuf, dis_style_immediate);
>        ins->scratchbuf[0] = '\0';
>      }
>  }
> @@ -13526,7 +13619,7 @@ MOVSXD_Fixup (instr_info *ins, int bytemode, int sizeflag)
>        *p++ = 'd';
>        break;
>      default:
> -      oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
> +      oappend (ins, INTERNAL_DISASSEMBLER_ERROR, dis_style_text);
>        break;
>      }
>
> @@ -13569,7 +13662,7 @@ DistinctDest_Fixup (instr_info *ins, int bytemode, int sizeflag)
>        || (ins->modrm.mod == 3
>           && modrm_reg == modrm_rm))
>      {
> -      oappend (ins, "(bad)");
> +      oappend (ins, "(bad)", dis_style_text);
>      }
>    else
>      OP_XMM (ins, bytemode, sizeflag);
> @@ -13589,14 +13682,14 @@ OP_Rounding (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>        /* Fall through.  */
>      case evex_rounding_mode:
>        ins->evex_used |= EVEX_b_used;
> -      oappend (ins, names_rounding[ins->vex.ll]);
> +      oappend (ins, names_rounding[ins->vex.ll], dis_style_text);
>        break;
>      case evex_sae_mode:
>        ins->evex_used |= EVEX_b_used;
> -      oappend (ins, "{");
> +      oappend (ins, "{", dis_style_text);
>        break;
>      default:
>        abort ();
>      }
> -  oappend (ins, "sae}");
> +  oappend (ins, "sae}", dis_style_text);
>  }
> --
> 2.25.4
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] objdump: fix styled printing of addresses
  2022-04-29 13:42 ` [PATCH 1/2] objdump: fix styled printing of addresses Andrew Burgess
@ 2022-05-02  7:14   ` Jan Beulich
  2022-05-03  9:52     ` Andrew Burgess
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2022-05-02  7:14 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: binutils

On 29.04.2022 15:42, Andrew Burgess via Binutils wrote:
> Previous work to add styled disassembler output missed a case in
> objdump_print_addr, which is fixed in this commit.
> ---
>  binutils/objdump.c | 9 +++++----
>  1 file changed, 5 insertions(+), 4 deletions(-)

Okay.

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
  2022-04-29 13:42 ` [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler Andrew Burgess
  2022-04-29 18:16   ` Vladimir Mezentsev
  2022-04-29 18:57   ` H.J. Lu
@ 2022-05-02  7:28   ` Jan Beulich
  2022-05-03 13:12     ` Andrew Burgess
  2 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2022-05-02  7:28 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: binutils

On 29.04.2022 15:42, Andrew Burgess via Binutils wrote:
> The i386 disassembler is pretty complex.  Most disassembly is done
> indirectly; operands are built into buffers within a struct instr_info
> instance, before finally being printed later in the disassembly
> process.
> 
> Sometimes the operand buffers are built in a different order to the
> order in which they will eventually be printed.
> 
> Each operand can contain multiple components, e.g. multiple registers,
> immediates, other textual elements (commas, brackets, etc).
> 
> When looking for how to apply styling I guess the ideal solution would
> be to move away from the operands being a single string that is built
> up, and instead have each operand be a list of "parts", where each
> part is some text and a style.  Then, when we eventually print the
> operand we would loop over the parts and print each part with the
> correct style.
> 
> But it feels like a huge amount of work to move from where we are
> now to that potentially ideal solution.  Plus, the above solution
> would be pretty complex.
> 
> So, instead I propose a .... different solution here, one that works
> with the existing infrastructure.
> 
> As each operand is built up, piece be piece, we pass through style
> information.  This style information is then encoded into the operand
> buffer (see below for details).  After this the code can continue to
> operate as it does right now in order to manage the set of operand
> buffers.
> 
> Then, as each operand is printed we can split the operand buffer into
> chunks at the style marker boundaries, with each chunk being printed
> in the correct style.
> 
> For encoding the style information I use the format "~%x~".  As far as
> I can tell the '~' is not otherwise used in the i386 disassembler, so
> this should serve as a unique marker.  To speed up writing and then
> reading the style markers, I take advantage of the fact that there are
> less than 16 styles so I know the '%x' will only ever be a single hex
> character.

Like H.J. I'd like to ask that you avoid ~ here (I actually have plans
to use it to make at least some 64-bit constants better recognizable);
I'm not sure about using non-ASCII though, as that may cause issues with
compilers treating non-ASCII wrong. I'd soften this to non-alnum, non-
operator characters (perhaps more generally non-printable). Otoh I guess
about _any_ character could be used in symbol names, so I'm not
convinced such an escaping model can be generally conflict free.

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 1/2] objdump: fix styled printing of addresses
  2022-05-02  7:14   ` Jan Beulich
@ 2022-05-03  9:52     ` Andrew Burgess
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Burgess @ 2022-05-03  9:52 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils

Jan Beulich via Binutils <binutils@sourceware.org> writes:

> On 29.04.2022 15:42, Andrew Burgess via Binutils wrote:
>> Previous work to add styled disassembler output missed a case in
>> objdump_print_addr, which is fixed in this commit.
>> ---
>>  binutils/objdump.c | 9 +++++----
>>  1 file changed, 5 insertions(+), 4 deletions(-)
>
> Okay.

Thanks, I pushed this first commit.

Andrew


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
  2022-05-02  7:28   ` Jan Beulich
@ 2022-05-03 13:12     ` Andrew Burgess
  2022-05-03 15:47       ` H.J. Lu
  2022-05-04  7:58       ` Jan Beulich
  0 siblings, 2 replies; 29+ messages in thread
From: Andrew Burgess @ 2022-05-03 13:12 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils

Jan Beulich via Binutils <binutils@sourceware.org> writes:

> On 29.04.2022 15:42, Andrew Burgess via Binutils wrote:
>> The i386 disassembler is pretty complex.  Most disassembly is done
>> indirectly; operands are built into buffers within a struct instr_info
>> instance, before finally being printed later in the disassembly
>> process.
>> 
>> Sometimes the operand buffers are built in a different order to the
>> order in which they will eventually be printed.
>> 
>> Each operand can contain multiple components, e.g. multiple registers,
>> immediates, other textual elements (commas, brackets, etc).
>> 
>> When looking for how to apply styling I guess the ideal solution would
>> be to move away from the operands being a single string that is built
>> up, and instead have each operand be a list of "parts", where each
>> part is some text and a style.  Then, when we eventually print the
>> operand we would loop over the parts and print each part with the
>> correct style.
>> 
>> But it feels like a huge amount of work to move from where we are
>> now to that potentially ideal solution.  Plus, the above solution
>> would be pretty complex.
>> 
>> So, instead I propose a .... different solution here, one that works
>> with the existing infrastructure.
>> 
>> As each operand is built up, piece be piece, we pass through style
>> information.  This style information is then encoded into the operand
>> buffer (see below for details).  After this the code can continue to
>> operate as it does right now in order to manage the set of operand
>> buffers.
>> 
>> Then, as each operand is printed we can split the operand buffer into
>> chunks at the style marker boundaries, with each chunk being printed
>> in the correct style.
>> 
>> For encoding the style information I use the format "~%x~".  As far as
>> I can tell the '~' is not otherwise used in the i386 disassembler, so
>> this should serve as a unique marker.  To speed up writing and then
>> reading the style markers, I take advantage of the fact that there are
>> less than 16 styles so I know the '%x' will only ever be a single hex
>> character.
>
> Like H.J. I'd like to ask that you avoid ~ here (I actually have plans
> to use it to make at least some 64-bit constants better recognizable);
> I'm not sure about using non-ASCII though, as that may cause issues with
> compilers treating non-ASCII wrong. I'd soften this to non-alnum, non-
> operator characters (perhaps more generally non-printable). Otoh I guess
> about _any_ character could be used in symbol names, so I'm not
> convinced such an escaping model can be generally conflict free.

Hi Jan,

I've addressed all the simple feedback from H.J. and Vladimir, and I
just need to figure out something for the escaping mechanism.

I'm still keen to try and go with an escaping based solution, my
reasoning is that I think that this is the solution least likely to
introduce latent disassembler bugs.

However, that position is based on my belief that there's no exhaustive
test for the i386 based disassembler, i.e. one that tests every single
valid instruction disassembles correctly.  If there was such a test then
I might be more tempted to try something more radical...

That said, if I was going to stick with an escaping scheme, then I have
some ideas for moving forward.

The current scheme relies on the fact that symbols are not printed
directly from the i386 disassembler, instead the i386 disassembler calls
back into the driver application (objdump, gdb) to print the symbol.  As
a result, symbols don't go through the instr_info::obuf buffer.  This
means that we never try to interpret a symbol name for escape
characters.

This means we avoid one of the issues that you raised, what if the
escape character appears in a symbol name; the answer is, I just don't
need to worry about this!

So, I only need to ensure that the escape character is:

  (a) not a character that the disassembler currently tries to directly
  print itself, and

  (b) not something that will ever be printed as part of an immediate.

Clearly my choice passes both right now, but looks like it will not pass
(b) forever.

One possible solution would be to replace all the remaining places where
we directly write to instr_info::obuf with calls to oappend_char.  I
could then extend the oappend API such that we do "real" escaping, that
is (assuming the continued use of '~' for now): '~X' would indicate a
style marker, with X being the style number, and '~~' would indicate a
literal '~' character.  In this was we really wouldn't care which
character we used (though we'd probably pick one that didn't crop up too
ofter just for ease of parsing the buffers).

An alternative solution would be to pick a non-printable character,
e.g. \001, and use this as the escape character in place of the current
'~'.  This seems to pass the (a) and (b) tests above, and if such a
character does ever appear in a symbol name, then, as I've said above, I
don't believe this would cause us any problems.

Here's a session that demonstrates a symbol containing '~' with the
current patch (obviously the final disassembler call actually has
colour in the output, which all looks correct to me):

  $ cat /tmp/weird.s
          .text
          .global "foo~bar"
          "foo~bar":
          nop
          nop
          nop
          call       "foo~bar"
  $ as -o /tmp/weird.o /tmp/weird.s
  $ ./binutils/objdump --disassembler-color=extended-color -d /tmp/weird.o 
  
  /tmp/weird.o:     file format elf64-x86-64
  
  
  Disassembly of section .text:
  
  0000000000000000 <foo~bar>:
     0:	90                   	nop
     1:	90                   	nop
     2:	90                   	nop
     3:	e8 00 00 00 00       	call   8 <foo~bar+0x8>


Thanks,
Andrew


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
  2022-04-29 18:57   ` H.J. Lu
@ 2022-05-03 13:14     ` Andrew Burgess
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Burgess @ 2022-05-03 13:14 UTC (permalink / raw)
  To: H.J. Lu; +Cc: Binutils

"H.J. Lu via Binutils" <binutils@sourceware.org> writes:

> .
>  w On Fri, Apr 29, 2022 at 6:48 AM Andrew Burgess via Binutils
> <binutils@sourceware.org> wrote:
>>
>> The i386 disassembler is pretty complex.  Most disassembly is done
>> indirectly; operands are built into buffers within a struct instr_info
>> instance, before finally being printed later in the disassembly
>> process.
>>
>> Sometimes the operand buffers are built in a different order to the
>> order in which they will eventually be printed.
>>
>> Each operand can contain multiple components, e.g. multiple registers,
>> immediates, other textual elements (commas, brackets, etc).
>>
>> When looking for how to apply styling I guess the ideal solution would
>> be to move away from the operands being a single string that is built
>> up, and instead have each operand be a list of "parts", where each
>> part is some text and a style.  Then, when we eventually print the
>> operand we would loop over the parts and print each part with the
>> correct style.
>>
>> But it feels like a huge amount of work to move from where we are
>> now to that potentially ideal solution.  Plus, the above solution
>> would be pretty complex.
>>
>> So, instead I propose a .... different solution here, one that works
>> with the existing infrastructure.
>>
>> As each operand is built up, piece be piece, we pass through style
>> information.  This style information is then encoded into the operand
>> buffer (see below for details).  After this the code can continue to
>> operate as it does right now in order to manage the set of operand
>> buffers.
>>
>> Then, as each operand is printed we can split the operand buffer into
>> chunks at the style marker boundaries, with each chunk being printed
>> in the correct style.
>>
>> For encoding the style information I use the format "~%x~".  As far as
>> I can tell the '~' is not otherwise used in the i386 disassembler, so
>
> Can you use a non-ascii character instead of ~?
>
>> this should serve as a unique marker.  To speed up writing and then
>> reading the style markers, I take advantage of the fact that there are
>> less than 16 styles so I know the '%x' will only ever be a single hex
>> character.
>>
>> In some (not very scientific) benchmarking on my machine,
>> disassembling a reasonably large (142M) shared library, I'm not seeing
>> any significant slow down in disassembler speed with this change.
>>
>> Most instructions are now being fully syntax highlighted when I
>> disassemble using the --disassembler-color=extended-color option.  I'm
>> sure that there are probably still a few corner cases that need fixing
>> up, but we can come back to them later I think.
>>
>> When disassembler syntax highlighting is not being used, then there
>> should be no user visible changes after this commit.
>> ---
>>  opcodes/i386-dis.c | 571 ++++++++++++++++++++++++++-------------------
>>  1 file changed, 332 insertions(+), 239 deletions(-)
>>
>> diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
>> index 1e3266329c1..c94d316a03f 100644
>> --- a/opcodes/i386-dis.c
>> +++ b/opcodes/i386-dis.c
>> @@ -42,12 +42,14 @@
>>  #include <setjmp.h>
>>  typedef struct instr_info instr_info;
>>
>> +#define STYLE_BUFFER_SIZE 10
>> +
>>  static int print_insn (bfd_vma, instr_info *);
>>  static void dofloat (instr_info *, int);
>>  static void OP_ST (instr_info *, int, int);
>>  static void OP_STi (instr_info *, int, int);
>>  static int putop (instr_info *, const char *, int);
>> -static void oappend (instr_info *, const char *);
>> +static void oappend (instr_info *, const char *, enum disassembler_style);
>
> Please add a new function, oappend_with_style, to take a new
> argument and change oappend to call oappend_with_style with
> dis_style_text.
>
>>  static void append_seg (instr_info *);
>>  static void OP_indirE (instr_info *, int, int);
>>  static void print_operand_value (instr_info *, char *, int, bfd_vma);
>> @@ -166,6 +168,8 @@ struct instr_info
>>    char *obufp;
>>    char *mnemonicendp;
>>    char scratchbuf[100];
>> +  char style_buffer[STYLE_BUFFER_SIZE];
>> +  char staging_area[100];
>>    unsigned char *start_codep;
>>    unsigned char *insn_codep;
>>    unsigned char *codep;
>> @@ -248,6 +252,8 @@ struct instr_info
>>
>>    enum x86_64_isa isa64;
>>
>> +  int (*printf) (instr_info *ins, enum disassembler_style style,
>> +                const char *fmt, ...) ATTRIBUTE_FPTR_PRINTF_3;
>>  };
>>
>>  /* Mark parts used in the REX prefix.  When we are testing for
>> @@ -9300,9 +9306,73 @@ get_sib (instr_info *ins, int sizeflag)
>>  /* Like oappend (below), but S is a string starting with '%'.
>>     In Intel syntax, the '%' is elided.  */
>>  static void
>> -oappend_maybe_intel (instr_info *ins, const char *s)
>> +oappend_maybe_intel (instr_info *ins, const char *s,
>> +                    enum disassembler_style style)
>
> oappend_maybe_intel_wityh_style
>
>>  {
>> -  oappend (ins, s + ins->intel_syntax);
>> +  oappend (ins, s + ins->intel_syntax, style);
>> +}
>> +
>> +/* Wrap around a call to INS->info->fprintf_styled_func, printing FMT.
>> +   STYLE is the default style to use in the fprintf_styled_func calls,
>> +   however, FMT might include embedded style markers (see oappend_style),
>> +   these embedded markers are not printed, but instead change the style
>> +   used in the next fprintf_styled_func call.
>> +
>> +   Return non-zero to indicate the print call was a success.  */
>> +
>> +static int ATTRIBUTE_PRINTF_3
>> +i386_dis_printf (instr_info *ins, enum disassembler_style style,
>> +                const char *fmt, ...)
>> +{
>> +  va_list ap;
>> +  enum disassembler_style curr_style = style;
>> +  char *start, *curr;
>> +
>> +  va_start (ap, fmt);
>> +  vsnprintf (ins->staging_area, 100, fmt, ap);
>> +  va_end (ap);
>> +
>> +  start = curr = ins->staging_area;
>> +
>> +  do
>> +    {
>> +      if (*curr == '\0' || *curr == '~')
>> +       {
>> +         /* Output content between our START position and CURR.  */
>> +         int len = curr - start;
>> +         (*ins->info->fprintf_styled_func) (ins->info->stream, curr_style,
>> +                                            "%.*s", len, start);
>> +         if (*curr == '\0')
>> +           break;
>> +
>> +         /* Update the CURR_STYLE, it is possible here that if the input
>> +            is corrupted in some way, then we may set CURR_STYLE to an
>> +            invalid value.  Don't worry though, we check for that in a
>> +            subsequent if block.  */
>> +         ++curr;
>> +         if (*curr >= '0' && *curr <= '9')
>> +           curr_style = (enum disassembler_style) (*curr - '0');
>> +         else if (*curr >= 'a' && *curr <= 'f')
>> +           curr_style = (enum disassembler_style) (*curr - 'a' + 10);
>> +         else
>> +           curr_style = dis_style_text;
>> +
>> +         /* Skip over the hex character, and the closing '~'.  Also
>> +            validate that CURR_STYLE is set to a valid value.  */
>> +         ++curr;
>> +         if (*curr != '~' || curr_style > dis_style_comment_start)
>> +           curr_style = dis_style_text;
>> +         ++curr;
>> +
>> +         /* Reset the START and CURR pointers to after the style marker.  */
>> +         start = curr;
>> +       }
>> +      else
>> +       ++curr;
>> +    }
>> +  while (true);
>> +
>> +  return 1;
>>  }
>>
>>  static int
>> @@ -9317,6 +9387,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>>    struct dis_private priv;
>>    int prefix_length;
>>
>> +  ins->printf = i386_dis_printf;
>>    ins->isa64 = 0;
>>    ins->intel_mnemonic = !SYSV386_COMPAT;
>>    ins->op_is_jump = false;
>> @@ -9401,8 +9472,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>>
>>    if (ins->address_mode == mode_64bit && sizeof (bfd_vma) < 8)
>>      {
>> -      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
>> -                                        _("64-bit address is disabled"));
>> +      ins->printf (ins, dis_style_text, _("64-bit address is disabled"));
>>        return -1;
>>      }
>>
>> @@ -9451,16 +9521,14 @@ print_insn (bfd_vma pc, instr_info *ins)
>>         {
>>           name = prefix_name (ins, priv.the_buffer[0], priv.orig_sizeflag);
>>           if (name != NULL)
>> -           (*ins->info->fprintf_styled_func)
>> -             (ins->info->stream, dis_style_mnemonic, "%s", name);
>> +           ins->printf (ins, dis_style_mnemonic, "%s", name);
>>           else
>>             {
>>               /* Just print the first byte as a .byte instruction.  */
>> -             (*ins->info->fprintf_styled_func)
>> -               (ins->info->stream, dis_style_assembler_directive, ".byte ");
>> -             (*ins->info->fprintf_styled_func)
>> -               (ins->info->stream, dis_style_immediate, "0x%x",
>> -                (unsigned int) priv.the_buffer[0]);
>> +             ins->printf (ins, dis_style_assembler_directive,
>> +                          ".byte ");
>> +             ins->printf (ins, dis_style_immediate, "0x%x",
>> +                          (unsigned int) priv.the_buffer[0]);
>>             }
>>
>>           return 1;
>> @@ -9478,10 +9546,9 @@ print_insn (bfd_vma pc, instr_info *ins)
>>        for (i = 0;
>>            i < (int) ARRAY_SIZE (ins->all_prefixes) && ins->all_prefixes[i];
>>            i++)
>> -       (*ins->info->fprintf_styled_func)
>> -         (ins->info->stream, dis_style_mnemonic, "%s%s",
>> -          (i == 0 ? "" : " "), prefix_name (ins, ins->all_prefixes[i],
>> -                                            sizeflag));
>> +       ins->printf (ins, dis_style_mnemonic, "%s%s",
>> +                    (i == 0 ? "" : " "),
>> +                    prefix_name (ins, ins->all_prefixes[i], sizeflag));
>>        return i;
>>      }
>>
>> @@ -9496,11 +9563,9 @@ print_insn (bfd_vma pc, instr_info *ins)
>>        /* Handle ins->prefixes before fwait.  */
>>        for (i = 0; i < ins->fwait_prefix && ins->all_prefixes[i];
>>            i++)
>> -       (*ins->info->fprintf_styled_func)
>> -         (ins->info->stream, dis_style_mnemonic, "%s ",
>> -          prefix_name (ins, ins->all_prefixes[i], sizeflag));
>> -      (*ins->info->fprintf_styled_func)
>> -       (ins->info->stream, dis_style_mnemonic, "fwait");
>> +       ins->printf (ins, dis_style_mnemonic, "%s ",
>> +                    prefix_name (ins, ins->all_prefixes[i], sizeflag));
>> +      ins->printf (ins, dis_style_mnemonic, "fwait");
>>        return i + 1;
>>      }
>>
>> @@ -9569,14 +9634,15 @@ print_insn (bfd_vma pc, instr_info *ins)
>>                   /* Don't print {%k0}.  */
>>                   if (ins->vex.mask_register_specifier)
>>                     {
>> -                     oappend (ins, "{");
>> +                     oappend (ins, "{", dis_style_text);
>>                       oappend_maybe_intel (ins,
>>                                            att_names_mask
>> -                                          [ins->vex.mask_register_specifier]);
>> -                     oappend (ins, "}");
>> +                                          [ins->vex.mask_register_specifier],
>> +                                          dis_style_text);
>> +                     oappend (ins, "}", dis_style_text);
>>                     }
>>                   if (ins->vex.zeroing)
>> -                   oappend (ins, "{z}");
>> +                   oappend (ins, "{z}", dis_style_text);
>>
>>                   /* S/G insns require a mask and don't allow
>>                      zeroing-masking.  */
>> @@ -9584,7 +9650,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>>                        || dp->op[0].bytemode == vex_vsib_q_w_dq_mode)
>>                       && (ins->vex.mask_register_specifier == 0
>>                           || ins->vex.zeroing))
>> -                   oappend (ins, "/(bad)");
>> +                   oappend (ins, "/(bad)", dis_style_text);
>>                 }
>>             }
>>
>> @@ -9598,8 +9664,8 @@ print_insn (bfd_vma pc, instr_info *ins)
>>                   ins->obufp = ins->op_out[i];
>>                   if (*ins->obufp)
>>                     continue;
>> -                 oappend (ins, names_rounding[ins->vex.ll]);
>> -                 oappend (ins, "bad}");
>> +                 oappend (ins, names_rounding[ins->vex.ll], dis_style_text);
>> +                 oappend (ins, "bad}", dis_style_text);
>>                   break;
>>                 }
>>             }
>> @@ -9649,16 +9715,14 @@ print_insn (bfd_vma pc, instr_info *ins)
>>       are all 0s in inverted form.  */
>>    if (ins->need_vex && ins->vex.register_specifier != 0)
>>      {
>> -      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
>> -                                        "(bad)");
>> +      ins->printf (ins, dis_style_text, "(bad)");
>>        return ins->end_codep - priv.the_buffer;
>>      }
>>
>>    /* If EVEX.z is set, there must be an actual mask register in use.  */
>>    if (ins->vex.zeroing && ins->vex.mask_register_specifier == 0)
>>      {
>> -      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
>> -                                        "(bad)");
>> +      ins->printf (ins, dis_style_text, "(bad)");
>>        return ins->end_codep - priv.the_buffer;
>>      }
>>
>> @@ -9669,8 +9733,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>>          the encoding invalid.  Most other PREFIX_OPCODE rules still apply.  */
>>        if (ins->need_vex ? !ins->vex.prefix : !(ins->prefixes & PREFIX_DATA))
>>         {
>> -         (*ins->info->fprintf_styled_func) (ins->info->stream,
>> -                                            dis_style_text, "(bad)");
>> +         ins->printf (ins, dis_style_text, "(bad)");
>>           return ins->end_codep - priv.the_buffer;
>>         }
>>        ins->used_prefixes |= PREFIX_DATA;
>> @@ -9697,8 +9760,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>>           || (ins->vex.evex && dp->prefix_requirement != PREFIX_DATA
>>               && !ins->vex.w != !(ins->used_prefixes & PREFIX_DATA)))
>>         {
>> -         (*ins->info->fprintf_styled_func) (ins->info->stream,
>> -                                            dis_style_text, "(bad)");
>> +         ins->printf (ins, dis_style_text, "(bad)");
>>           return ins->end_codep - priv.the_buffer;
>>         }
>>        break;
>> @@ -9748,24 +9810,28 @@ print_insn (bfd_vma pc, instr_info *ins)
>>         if (name == NULL)
>>           abort ();
>>         prefix_length += strlen (name) + 1;
>> -       (*ins->info->fprintf_styled_func)
>> -         (ins->info->stream, dis_style_mnemonic, "%s ", name);
>> +       ins->printf (ins, dis_style_mnemonic, "%s ", name);
>>        }
>>
>>    /* Check maximum code length.  */
>>    if ((ins->codep - ins->start_codep) > MAX_CODE_LENGTH)
>>      {
>> -      (*ins->info->fprintf_styled_func)
>> -       (ins->info->stream, dis_style_text, "(bad)");
>> +      ins->printf (ins, dis_style_text, "(bad)");
>>        return MAX_CODE_LENGTH;
>>      }
>>
>> -  ins->obufp = ins->mnemonicendp;
>> -  for (i = strlen (ins->obuf) + prefix_length; i < 6; i++)
>> -    oappend (ins, " ");
>> -  oappend (ins, " ");
>> -  (*ins->info->fprintf_styled_func)
>> -    (ins->info->stream, dis_style_mnemonic, "%s", ins->obuf);
>> +  i = strlen (ins->obuf);
>> +  if (ins->mnemonicendp == ins->obuf + i)
>> +    {
>> +      i += prefix_length;
>> +      if (i < 6)
>> +       i = 6 - i + 1;
>> +      else
>> +       i = 1;
>> +    }
>> +  else
>> +    i = 0;
>> +  ins->printf (ins, dis_style_mnemonic, "%s%*s", ins->obuf, i, "");
>>
>>    /* The enter and bound instructions are printed with operands in the same
>>       order as the intel book; everything else is printed in reverse order.  */
>> @@ -9804,8 +9870,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>>      if (*op_txt[i])
>>        {
>>         if (needcomma)
>> -         (*ins->info->fprintf_styled_func) (ins->info->stream,
>> -                                            dis_style_text, ",");
>> +         ins->printf (ins, dis_style_text, ",");
>>         if (ins->op_index[i] != -1 && !ins->op_riprel[i])
>>           {
>>             bfd_vma target = (bfd_vma) ins->op_address[ins->op_index[i]];
>> @@ -9821,18 +9886,14 @@ print_insn (bfd_vma pc, instr_info *ins)
>>             (*ins->info->print_address_func) (target, ins->info);
>>           }
>>         else
>> -         (*ins->info->fprintf_styled_func) (ins->info->stream,
>> -                                            dis_style_text, "%s",
>> -                                            op_txt[i]);
>> +         ins->printf (ins, dis_style_text, "%s", op_txt[i]);
>>         needcomma = 1;
>>        }
>>
>>    for (i = 0; i < MAX_OPERANDS; i++)
>>      if (ins->op_index[i] != -1 && ins->op_riprel[i])
>>        {
>> -       (*ins->info->fprintf_styled_func) (ins->info->stream,
>> -                                          dis_style_comment_start,
>> -                                          "        # ");
>> +       ins->printf (ins, dis_style_comment_start, "        # ");
>>         (*ins->info->print_address_func) ((bfd_vma)
>>                         (ins->start_pc + (ins->codep - ins->start_codep)
>>                          + ins->op_address[ins->op_index[i]]), ins->info);
>> @@ -10217,15 +10278,18 @@ static void
>>  OP_ST (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>>         int sizeflag ATTRIBUTE_UNUSED)
>>  {
>> -  oappend_maybe_intel (ins, "%st");
>> +  oappend_maybe_intel (ins, "%st", dis_style_text);
>>  }
>>
>>  static void
>>  OP_STi (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>>         int sizeflag ATTRIBUTE_UNUSED)
>>  {
>> -  sprintf (ins->scratchbuf, "%%st(%d)", ins->modrm.rm);
>> -  oappend_maybe_intel (ins, ins->scratchbuf);
>> +  oappend_maybe_intel (ins, "%st", dis_style_text);
>> +  oappend (ins, "(", dis_style_text);
>> +  sprintf (ins->scratchbuf, "%d", ins->modrm.rm);
>> +  oappend (ins, ins->scratchbuf, dis_style_immediate);
>> +  oappend (ins, ")", dis_style_text);
>>  }
>>
>>  /* Capital letters in template are macros.  */
>> @@ -10329,7 +10393,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
>>                 if (!ins->vex.evex || ins->vex.w)
>>                   *ins->obufp++ = 'd';
>>                 else
>> -                 oappend (ins, "{bad}");
>> +                 oappend (ins, "{bad}", dis_style_text);
>>                 break;
>>               default:
>>                 abort ();
>> @@ -10424,7 +10488,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
>>               if (!ins->vex.w)
>>                 *ins->obufp++ = 'h';
>>               else
>> -               oappend (ins, "{bad}");
>> +               oappend (ins, "{bad}", dis_style_text);
>>             }
>>           else
>>             abort ();
>> @@ -10608,7 +10672,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
>>               if (!ins->vex.evex || !ins->vex.w)
>>                 *ins->obufp++ = 's';
>>               else
>> -               oappend (ins, "{bad}");
>> +               oappend (ins, "{bad}", dis_style_text);
>>               break;
>>             default:
>>               abort ();
>> @@ -10772,12 +10836,47 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
>>    return 0;
>>  }
>>
>> +/* Add a style marker "~X~" to *INS->obufp that encodes STYLE.  This
>> +   assumes that the buffer pointed to by INS->obufp has space.  In the
>> +   style marker 'X' is replaced with a single hex character that represents
>> +   STYLE.  */
>> +
>> +static void
>> +oappend_style (instr_info *ins, enum disassembler_style style)
>> +{
>> +  int num = (int) style;
>> +
>> +  /* We currently assume that STYLE can be encoded as a single hex
>> +     character.  If more styles are added then this might start to fail,
>> +     and we'll need to expand this code.  */
>> +  if (num > 0xf)
>> +    abort ();
>> +
>> +  *ins->obufp++ = '~';
>> +  *ins->obufp++ = (num < 10 ? ('0' + num)
>> +                  : ((num < 16) ? ('a' + (num - 10)) : '0'));
>> +  *ins->obufp++ = '~';
>> +  *ins->obufp = '\0';
>
> Do you need '\0'?

No, not really.  I found having it in helpful for debug as it leaves the
buffer with a trailing null.

I've left this in, but added a comment explaining that it's not needed,
but maybe helpful - is that OK?

I've addressed all your other comments, except the use of '~' - Jan has
also asked about that, so I've followed up to Jan to get some more
direction before changing that part of the patch.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
  2022-04-29 18:16   ` Vladimir Mezentsev
@ 2022-05-03 13:15     ` Andrew Burgess
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Burgess @ 2022-05-03 13:15 UTC (permalink / raw)
  To: Vladimir Mezentsev, binutils

Vladimir Mezentsev via Binutils <binutils@sourceware.org> writes:

> On 4/29/22 06:42, Andrew Burgess via Binutils wrote:
>> The i386 disassembler is pretty complex.  Most disassembly is done
>> indirectly; operands are built into buffers within a struct instr_info
>> instance, before finally being printed later in the disassembly
>> process.
>>
>> Sometimes the operand buffers are built in a different order to the
>> order in which they will eventually be printed.
>>
>> Each operand can contain multiple components, e.g. multiple registers,
>> immediates, other textual elements (commas, brackets, etc).
>>
>> When looking for how to apply styling I guess the ideal solution would
>> be to move away from the operands being a single string that is built
>> up, and instead have each operand be a list of "parts", where each
>> part is some text and a style.  Then, when we eventually print the
>> operand we would loop over the parts and print each part with the
>> correct style.
>>
>> But it feels like a huge amount of work to move from where we are
>> now to that potentially ideal solution.  Plus, the above solution
>> would be pretty complex.
>>
>> So, instead I propose a .... different solution here, one that works
>> with the existing infrastructure.
>>
>> As each operand is built up, piece be piece, we pass through style
>> information.  This style information is then encoded into the operand
>> buffer (see below for details).  After this the code can continue to
>> operate as it does right now in order to manage the set of operand
>> buffers.
>>
>> Then, as each operand is printed we can split the operand buffer into
>> chunks at the style marker boundaries, with each chunk being printed
>> in the correct style.
>>
>> For encoding the style information I use the format "~%x~".  As far as
>> I can tell the '~' is not otherwise used in the i386 disassembler, so
>> this should serve as a unique marker.  To speed up writing and then
>> reading the style markers, I take advantage of the fact that there are
>> less than 16 styles so I know the '%x' will only ever be a single hex
>> character.
>>
>> In some (not very scientific) benchmarking on my machine,
>> disassembling a reasonably large (142M) shared library, I'm not seeing
>> any significant slow down in disassembler speed with this change.
>>
>> Most instructions are now being fully syntax highlighted when I
>> disassemble using the --disassembler-color=extended-color option.  I'm
>> sure that there are probably still a few corner cases that need fixing
>> up, but we can come back to them later I think.
>>
>> When disassembler syntax highlighting is not being used, then there
>> should be no user visible changes after this commit.
>> ---
>>   opcodes/i386-dis.c | 571 ++++++++++++++++++++++++++-------------------
>>   1 file changed, 332 insertions(+), 239 deletions(-)
>>
>> diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
>> index 1e3266329c1..c94d316a03f 100644
>> --- a/opcodes/i386-dis.c
>> +++ b/opcodes/i386-dis.c
>> @@ -42,12 +42,14 @@
>>   #include <setjmp.h>
>>   typedef struct instr_info instr_info;
>>   
>> +#define STYLE_BUFFER_SIZE 10
>> +
>>   static int print_insn (bfd_vma, instr_info *);
>>   static void dofloat (instr_info *, int);
>>   static void OP_ST (instr_info *, int, int);
>>   static void OP_STi (instr_info *, int, int);
>>   static int putop (instr_info *, const char *, int);
>> -static void oappend (instr_info *, const char *);
>> +static void oappend (instr_info *, const char *, enum disassembler_style);
>>   static void append_seg (instr_info *);
>>   static void OP_indirE (instr_info *, int, int);
>>   static void print_operand_value (instr_info *, char *, int, bfd_vma);
>> @@ -166,6 +168,8 @@ struct instr_info
>>     char *obufp;
>>     char *mnemonicendp;
>>     char scratchbuf[100];
>> +  char style_buffer[STYLE_BUFFER_SIZE];
>
> I don't see where  style_buffer is used.
> It looks like style_buffer and  STYLE_BUFFER_SIZE are not needed.
>
>> +  char staging_area[100];
>
>   staging_area is used only in i386_dis_printf().
> Why this is not a local array inside i386_dis_printf() ?
>
>
>>     unsigned char *start_codep;
>>     unsigned char *insn_codep;
>>     unsigned char *codep;
>> @@ -248,6 +252,8 @@ struct instr_info
>>   
>>     enum x86_64_isa isa64;
>>   
>> +  int (*printf) (instr_info *ins, enum disassembler_style style,
>> +		 const char *fmt, ...) ATTRIBUTE_FPTR_PRINTF_3;
>>   };
>>   
>>   /* Mark parts used in the REX prefix.  When we are testing for
>> @@ -9300,9 +9306,73 @@ get_sib (instr_info *ins, int sizeflag)
>>   /* Like oappend (below), but S is a string starting with '%'.
>>      In Intel syntax, the '%' is elided.  */
>>   static void
>> -oappend_maybe_intel (instr_info *ins, const char *s)
>> +oappend_maybe_intel (instr_info *ins, const char *s,
>> +		     enum disassembler_style style)
>>   {
>> -  oappend (ins, s + ins->intel_syntax);
>> +  oappend (ins, s + ins->intel_syntax, style);
>> +}
>> +
>> +/* Wrap around a call to INS->info->fprintf_styled_func, printing FMT.
>> +   STYLE is the default style to use in the fprintf_styled_func calls,
>> +   however, FMT might include embedded style markers (see oappend_style),
>> +   these embedded markers are not printed, but instead change the style
>> +   used in the next fprintf_styled_func call.
>> +
>> +   Return non-zero to indicate the print call was a success.  */
>> +
>> +static int ATTRIBUTE_PRINTF_3
>> +i386_dis_printf (instr_info *ins, enum disassembler_style style,
>> +		 const char *fmt, ...)
>> +{
>> +  va_list ap;
>> +  enum disassembler_style curr_style = style;
>> +  char *start, *curr;
>> +
>> +  va_start (ap, fmt);
>> +  vsnprintf (ins->staging_area, 100, fmt, ap);
>
> Maybe sizeof (ins->staging_area) instead of 100 is better.
>
> As I wrote above,  staging_area  can be declared inside i386_dis_printf.

Vladimir,

Thanks, I've addressed all these issues in my local branch.  Once I've
resolved the use of '~' that H.J. and Jan have asked about I'll post an
updated version.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
  2022-05-03 13:12     ` Andrew Burgess
@ 2022-05-03 15:47       ` H.J. Lu
  2022-05-04  7:58       ` Jan Beulich
  1 sibling, 0 replies; 29+ messages in thread
From: H.J. Lu @ 2022-05-03 15:47 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: Jan Beulich, binutils

On Tue, May 3, 2022 at 6:14 AM Andrew Burgess via Binutils
<binutils@sourceware.org> wrote:
>
> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>
> > On 29.04.2022 15:42, Andrew Burgess via Binutils wrote:
> >> The i386 disassembler is pretty complex.  Most disassembly is done
> >> indirectly; operands are built into buffers within a struct instr_info
> >> instance, before finally being printed later in the disassembly
> >> process.
> >>
> >> Sometimes the operand buffers are built in a different order to the
> >> order in which they will eventually be printed.
> >>
> >> Each operand can contain multiple components, e.g. multiple registers,
> >> immediates, other textual elements (commas, brackets, etc).
> >>
> >> When looking for how to apply styling I guess the ideal solution would
> >> be to move away from the operands being a single string that is built
> >> up, and instead have each operand be a list of "parts", where each
> >> part is some text and a style.  Then, when we eventually print the
> >> operand we would loop over the parts and print each part with the
> >> correct style.
> >>
> >> But it feels like a huge amount of work to move from where we are
> >> now to that potentially ideal solution.  Plus, the above solution
> >> would be pretty complex.
> >>
> >> So, instead I propose a .... different solution here, one that works
> >> with the existing infrastructure.
> >>
> >> As each operand is built up, piece be piece, we pass through style
> >> information.  This style information is then encoded into the operand
> >> buffer (see below for details).  After this the code can continue to
> >> operate as it does right now in order to manage the set of operand
> >> buffers.
> >>
> >> Then, as each operand is printed we can split the operand buffer into
> >> chunks at the style marker boundaries, with each chunk being printed
> >> in the correct style.
> >>
> >> For encoding the style information I use the format "~%x~".  As far as
> >> I can tell the '~' is not otherwise used in the i386 disassembler, so
> >> this should serve as a unique marker.  To speed up writing and then
> >> reading the style markers, I take advantage of the fact that there are
> >> less than 16 styles so I know the '%x' will only ever be a single hex
> >> character.
> >
> > Like H.J. I'd like to ask that you avoid ~ here (I actually have plans
> > to use it to make at least some 64-bit constants better recognizable);
> > I'm not sure about using non-ASCII though, as that may cause issues with
> > compilers treating non-ASCII wrong. I'd soften this to non-alnum, non-
> > operator characters (perhaps more generally non-printable). Otoh I guess
> > about _any_ character could be used in symbol names, so I'm not
> > convinced such an escaping model can be generally conflict free.
>
> Hi Jan,
>
> I've addressed all the simple feedback from H.J. and Vladimir, and I
> just need to figure out something for the escaping mechanism.
>
> I'm still keen to try and go with an escaping based solution, my
> reasoning is that I think that this is the solution least likely to
> introduce latent disassembler bugs.
>
> However, that position is based on my belief that there's no exhaustive
> test for the i386 based disassembler, i.e. one that tests every single
> valid instruction disassembles correctly.  If there was such a test then
> I might be more tempted to try something more radical...
>
> That said, if I was going to stick with an escaping scheme, then I have
> some ideas for moving forward.
>
> The current scheme relies on the fact that symbols are not printed
> directly from the i386 disassembler, instead the i386 disassembler calls
> back into the driver application (objdump, gdb) to print the symbol.  As
> a result, symbols don't go through the instr_info::obuf buffer.  This
> means that we never try to interpret a symbol name for escape
> characters.
>
> This means we avoid one of the issues that you raised, what if the
> escape character appears in a symbol name; the answer is, I just don't
> need to worry about this!
>
> So, I only need to ensure that the escape character is:
>
>   (a) not a character that the disassembler currently tries to directly
>   print itself, and
>
>   (b) not something that will ever be printed as part of an immediate.
>
> Clearly my choice passes both right now, but looks like it will not pass
> (b) forever.
>
> One possible solution would be to replace all the remaining places where
> we directly write to instr_info::obuf with calls to oappend_char.  I
> could then extend the oappend API such that we do "real" escaping, that
> is (assuming the continued use of '~' for now): '~X' would indicate a
> style marker, with X being the style number, and '~~' would indicate a
> literal '~' character.  In this was we really wouldn't care which
> character we used (though we'd probably pick one that didn't crop up too
> ofter just for ease of parsing the buffers).
>
> An alternative solution would be to pick a non-printable character,
> e.g. \001, and use this as the escape character in place of the current
> '~'.  This seems to pass the (a) and (b) tests above, and if such a
> character does ever appear in a symbol name, then, as I've said above, I
> don't believe this would cause us any problems.

I like \001.   We can always change it later.  Let's wait for input from Jan.

> Here's a session that demonstrates a symbol containing '~' with the
> current patch (obviously the final disassembler call actually has
> colour in the output, which all looks correct to me):
>
>   $ cat /tmp/weird.s
>           .text
>           .global "foo~bar"
>           "foo~bar":
>           nop
>           nop
>           nop
>           call       "foo~bar"
>   $ as -o /tmp/weird.o /tmp/weird.s
>   $ ./binutils/objdump --disassembler-color=extended-color -d /tmp/weird.o
>
>   /tmp/weird.o:     file format elf64-x86-64
>
>
>   Disassembly of section .text:
>
>   0000000000000000 <foo~bar>:
>      0: 90                      nop
>      1: 90                      nop
>      2: 90                      nop
>      3: e8 00 00 00 00          call   8 <foo~bar+0x8>
>
>
> Thanks,
> Andrew
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
  2022-05-03 13:12     ` Andrew Burgess
  2022-05-03 15:47       ` H.J. Lu
@ 2022-05-04  7:58       ` Jan Beulich
  2022-05-09  9:48         ` Andrew Burgess
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2022-05-04  7:58 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: binutils, H.J. Lu

On 03.05.2022 15:12, Andrew Burgess wrote:
> Jan Beulich via Binutils <binutils@sourceware.org> writes:
> 
>> On 29.04.2022 15:42, Andrew Burgess via Binutils wrote:
>>> The i386 disassembler is pretty complex.  Most disassembly is done
>>> indirectly; operands are built into buffers within a struct instr_info
>>> instance, before finally being printed later in the disassembly
>>> process.
>>>
>>> Sometimes the operand buffers are built in a different order to the
>>> order in which they will eventually be printed.
>>>
>>> Each operand can contain multiple components, e.g. multiple registers,
>>> immediates, other textual elements (commas, brackets, etc).
>>>
>>> When looking for how to apply styling I guess the ideal solution would
>>> be to move away from the operands being a single string that is built
>>> up, and instead have each operand be a list of "parts", where each
>>> part is some text and a style.  Then, when we eventually print the
>>> operand we would loop over the parts and print each part with the
>>> correct style.
>>>
>>> But it feels like a huge amount of work to move from where we are
>>> now to that potentially ideal solution.  Plus, the above solution
>>> would be pretty complex.
>>>
>>> So, instead I propose a .... different solution here, one that works
>>> with the existing infrastructure.
>>>
>>> As each operand is built up, piece be piece, we pass through style
>>> information.  This style information is then encoded into the operand
>>> buffer (see below for details).  After this the code can continue to
>>> operate as it does right now in order to manage the set of operand
>>> buffers.
>>>
>>> Then, as each operand is printed we can split the operand buffer into
>>> chunks at the style marker boundaries, with each chunk being printed
>>> in the correct style.
>>>
>>> For encoding the style information I use the format "~%x~".  As far as
>>> I can tell the '~' is not otherwise used in the i386 disassembler, so
>>> this should serve as a unique marker.  To speed up writing and then
>>> reading the style markers, I take advantage of the fact that there are
>>> less than 16 styles so I know the '%x' will only ever be a single hex
>>> character.
>>
>> Like H.J. I'd like to ask that you avoid ~ here (I actually have plans
>> to use it to make at least some 64-bit constants better recognizable);
>> I'm not sure about using non-ASCII though, as that may cause issues with
>> compilers treating non-ASCII wrong. I'd soften this to non-alnum, non-
>> operator characters (perhaps more generally non-printable). Otoh I guess
>> about _any_ character could be used in symbol names, so I'm not
>> convinced such an escaping model can be generally conflict free.
> 
> Hi Jan,
> 
> I've addressed all the simple feedback from H.J. and Vladimir, and I
> just need to figure out something for the escaping mechanism.
> 
> I'm still keen to try and go with an escaping based solution, my
> reasoning is that I think that this is the solution least likely to
> introduce latent disassembler bugs.
> 
> However, that position is based on my belief that there's no exhaustive
> test for the i386 based disassembler, i.e. one that tests every single
> valid instruction disassembles correctly.  If there was such a test then
> I might be more tempted to try something more radical...
> 
> That said, if I was going to stick with an escaping scheme, then I have
> some ideas for moving forward.
> 
> The current scheme relies on the fact that symbols are not printed
> directly from the i386 disassembler, instead the i386 disassembler calls
> back into the driver application (objdump, gdb) to print the symbol.  As
> a result, symbols don't go through the instr_info::obuf buffer.  This
> means that we never try to interpret a symbol name for escape
> characters.

Hmm, indeed. I have to admit that I view it as a significant shortcoming
of the disassembler that it doesn't resolve addresses in the output. So
I'd like to at least not see the road being closed towards improving this.

> This means we avoid one of the issues that you raised, what if the
> escape character appears in a symbol name; the answer is, I just don't
> need to worry about this!
> 
> So, I only need to ensure that the escape character is:
> 
>   (a) not a character that the disassembler currently tries to directly
>   print itself, and
> 
>   (b) not something that will ever be printed as part of an immediate.

Or, more generally, as part of any kind of operand.

> Clearly my choice passes both right now, but looks like it will not pass
> (b) forever.
> 
> One possible solution would be to replace all the remaining places where
> we directly write to instr_info::obuf with calls to oappend_char.

I guess this might be troublesome. The way the disassembler works is a
little quirky here and there, and hence one needs to play tricks every
now and then to half-way reasonably deal with certain special cases.

>  I
> could then extend the oappend API such that we do "real" escaping, that
> is (assuming the continued use of '~' for now): '~X' would indicate a
> style marker, with X being the style number, and '~~' would indicate a
> literal '~' character.  In this was we really wouldn't care which
> character we used (though we'd probably pick one that didn't crop up too
> ofter just for ease of parsing the buffers).
> 
> An alternative solution would be to pick a non-printable character,
> e.g. \001, and use this as the escape character in place of the current
> '~'.  This seems to pass the (a) and (b) tests above, and if such a
> character does ever appear in a symbol name, then, as I've said above, I
> don't believe this would cause us any problems.

I suppose \001 (or a character very close to this, as iirc \001 has
some meaning internally in gas, and I'm not entirely certain none of
these uses can ever "escape" gas) is good to start with. Provided it
is properly abstracted so it can, if necessary, be _very_ easily
changed (by modifying exactly one line, or - if you need both a
single-quoted and a double-quoted instance - two adjacent ones).

Albeit, thinking of this last aspect, maybe it would be better to
only have a double-quoted instance in the first place, and allow
for the escape to be more than a single character if need be ...

And yes - if a symbol name was possible to hit and if that symbol
name contained such an escape sequence, aiui the worst that would
happen is bogus coloring? IOW the escape would not be looked for and
replaced / processed when coloring is disabled?

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
  2022-05-04  7:58       ` Jan Beulich
@ 2022-05-09  9:48         ` Andrew Burgess
  2022-05-09 12:54           ` [PATCHv2] " Andrew Burgess
  2022-05-18  7:06           ` [PATCH 2/2] " Jan Beulich
  0 siblings, 2 replies; 29+ messages in thread
From: Andrew Burgess @ 2022-05-09  9:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils

Jan Beulich via Binutils <binutils@sourceware.org> writes:

> On 03.05.2022 15:12, Andrew Burgess wrote:
>> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>> 
>>> On 29.04.2022 15:42, Andrew Burgess via Binutils wrote:
>>>> The i386 disassembler is pretty complex.  Most disassembly is done
>>>> indirectly; operands are built into buffers within a struct instr_info
>>>> instance, before finally being printed later in the disassembly
>>>> process.
>>>>
>>>> Sometimes the operand buffers are built in a different order to the
>>>> order in which they will eventually be printed.
>>>>
>>>> Each operand can contain multiple components, e.g. multiple registers,
>>>> immediates, other textual elements (commas, brackets, etc).
>>>>
>>>> When looking for how to apply styling I guess the ideal solution would
>>>> be to move away from the operands being a single string that is built
>>>> up, and instead have each operand be a list of "parts", where each
>>>> part is some text and a style.  Then, when we eventually print the
>>>> operand we would loop over the parts and print each part with the
>>>> correct style.
>>>>
>>>> But it feels like a huge amount of work to move from where we are
>>>> now to that potentially ideal solution.  Plus, the above solution
>>>> would be pretty complex.
>>>>
>>>> So, instead I propose a .... different solution here, one that works
>>>> with the existing infrastructure.
>>>>
>>>> As each operand is built up, piece be piece, we pass through style
>>>> information.  This style information is then encoded into the operand
>>>> buffer (see below for details).  After this the code can continue to
>>>> operate as it does right now in order to manage the set of operand
>>>> buffers.
>>>>
>>>> Then, as each operand is printed we can split the operand buffer into
>>>> chunks at the style marker boundaries, with each chunk being printed
>>>> in the correct style.
>>>>
>>>> For encoding the style information I use the format "~%x~".  As far as
>>>> I can tell the '~' is not otherwise used in the i386 disassembler, so
>>>> this should serve as a unique marker.  To speed up writing and then
>>>> reading the style markers, I take advantage of the fact that there are
>>>> less than 16 styles so I know the '%x' will only ever be a single hex
>>>> character.
>>>
>>> Like H.J. I'd like to ask that you avoid ~ here (I actually have plans
>>> to use it to make at least some 64-bit constants better recognizable);
>>> I'm not sure about using non-ASCII though, as that may cause issues with
>>> compilers treating non-ASCII wrong. I'd soften this to non-alnum, non-
>>> operator characters (perhaps more generally non-printable). Otoh I guess
>>> about _any_ character could be used in symbol names, so I'm not
>>> convinced such an escaping model can be generally conflict free.
>> 
>> Hi Jan,
>> 
>> I've addressed all the simple feedback from H.J. and Vladimir, and I
>> just need to figure out something for the escaping mechanism.
>> 
>> I'm still keen to try and go with an escaping based solution, my
>> reasoning is that I think that this is the solution least likely to
>> introduce latent disassembler bugs.
>> 
>> However, that position is based on my belief that there's no exhaustive
>> test for the i386 based disassembler, i.e. one that tests every single
>> valid instruction disassembles correctly.  If there was such a test then
>> I might be more tempted to try something more radical...
>> 
>> That said, if I was going to stick with an escaping scheme, then I have
>> some ideas for moving forward.
>> 
>> The current scheme relies on the fact that symbols are not printed
>> directly from the i386 disassembler, instead the i386 disassembler calls
>> back into the driver application (objdump, gdb) to print the symbol.  As
>> a result, symbols don't go through the instr_info::obuf buffer.  This
>> means that we never try to interpret a symbol name for escape
>> characters.
>
> Hmm, indeed. I have to admit that I view it as a significant shortcoming
> of the disassembler that it doesn't resolve addresses in the output. So
> I'd like to at least not see the road being closed towards improving this.
>
>> This means we avoid one of the issues that you raised, what if the
>> escape character appears in a symbol name; the answer is, I just don't
>> need to worry about this!
>> 
>> So, I only need to ensure that the escape character is:
>> 
>>   (a) not a character that the disassembler currently tries to directly
>>   print itself, and
>> 
>>   (b) not something that will ever be printed as part of an immediate.
>
> Or, more generally, as part of any kind of operand.

Sure, but the reason I single out immedates here is I think these are
the only operand whose content is not statically know within the
disassembler.

For example, register operands, every possible register operand value is
enumerated within the i386-dis.c source file, right?  So when I proposed
using '~' I could simply search the source file, find no uses, and know
that character is not (currently) used within a register name.

Immediates are different though, for them we rely on libc to generate
the textual representation.

The only other operand type that might contain "unknown" characters
would be a field that contains an address and potentially a symbol name,
but as was already discussed, these are not printed through the
disassembler.

My question then, other than the exceptions I've already listed, are
there other types of operand where the content doesn't already exit
within i386-dis.c?

>
>> Clearly my choice passes both right now, but looks like it will not pass
>> (b) forever.
>> 
>> One possible solution would be to replace all the remaining places where
>> we directly write to instr_info::obuf with calls to oappend_char.
>
> I guess this might be troublesome. The way the disassembler works is a
> little quirky here and there, and hence one needs to play tricks every
> now and then to half-way reasonably deal with certain special cases.
>
>>  I
>> could then extend the oappend API such that we do "real" escaping, that
>> is (assuming the continued use of '~' for now): '~X' would indicate a
>> style marker, with X being the style number, and '~~' would indicate a
>> literal '~' character.  In this was we really wouldn't care which
>> character we used (though we'd probably pick one that didn't crop up too
>> ofter just for ease of parsing the buffers).
>> 
>> An alternative solution would be to pick a non-printable character,
>> e.g. \001, and use this as the escape character in place of the current
>> '~'.  This seems to pass the (a) and (b) tests above, and if such a
>> character does ever appear in a symbol name, then, as I've said above, I
>> don't believe this would cause us any problems.
>
> I suppose \001 (or a character very close to this, as iirc \001 has
> some meaning internally in gas, and I'm not entirely certain none of
> these uses can ever "escape" gas) is good to start with. Provided it
> is properly abstracted so it can, if necessary, be _very_ easily
> changed (by modifying exactly one line, or - if you need both a
> single-quoted and a double-quoted instance - two adjacent ones).
>
> Albeit, thinking of this last aspect, maybe it would be better to
> only have a double-quoted instance in the first place, and allow
> for the escape to be more than a single character if need be ...
>
> And yes - if a symbol name was possible to hit and if that symbol
> name contained such an escape sequence, aiui the worst that would
> happen is bogus coloring? IOW the escape would not be looked for and
> replaced / processed when coloring is disabled?

Unfortunately this is not correct.  The disassembler always sends
styling information to the user (objdump, gdb, etc), its the user that
decides if the output should be styled or not.

What this means is that if the disassembler encountered a random symbol
(which would be a pretty big change to the disassembler), and the symbol
did include something like ~a~ (using the current character to make it
more readable here), then the whole '~a~' part would disappear from the
symbol name, this would be seen as a style marker, the next up to the
start of '~a~' sould take the previous style, and the text after '~a~'
would take the '0xa' style, but the '~a~' itself would always be
stripped out.

One relatively easy solution here would be to say that, when we add the
ability to include symbol names in the disassembler output buffers, at
that point we can add "true" escaping.  So if your symbol name is
'foo~a~bar' then as this is added to the disassebmler buffer we would
actually add 'foo~~a~~bar', and we'd extend the code that parses out
styling information so that it could handle this case.  This feels like
it should be easy enough to do.

All we then have to do is convince ourselves that there's no way for the
escape character to make it into the disassembler output from any other
source, and we should be fine.

For example, your concern about \001 escaping from gas.  Other than
within a symbol name, how might the disassembler end up trying to print
this byte?

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCHv2] libopcodes: extend the styling within the i386 disassembler
  2022-05-09  9:48         ` Andrew Burgess
@ 2022-05-09 12:54           ` Andrew Burgess
  2022-05-18 12:27             ` Jan Beulich
                               ` (2 more replies)
  2022-05-18  7:06           ` [PATCH 2/2] " Jan Beulich
  1 sibling, 3 replies; 29+ messages in thread
From: Andrew Burgess @ 2022-05-09 12:54 UTC (permalink / raw)
  To: binutils; +Cc: Andrew Burgess

In patch v2:

  - Addressed all minor feedback items from Vladimir, H.J. and Jan,

  - Switched to using \002 as the styling escape character,

  - Escape character is defined once near the top of i386-dis.c making
    it easy to switch to a different character if needed,

  - Detection of the style escape character is stricter in
    i386_dis_printf,

  - Proper error handling in i386_dis_printf, though I can't imagine
    when this would actually trigger.

---

The i386 disassembler is pretty complex.  Most disassembly is done
indirectly; operands are built into buffers within a struct instr_info
instance, before finally being printed later in the disassembly
process.

Sometimes the operand buffers are built in a different order to the
order in which they will eventually be printed.

Each operand can contain multiple components, e.g. multiple registers,
immediates, other textual elements (commas, brackets, etc).

When looking for how to apply styling I guess the ideal solution would
be to move away from the operands being a single string that is built
up, and instead have each operand be a list of "parts", where each
part is some text and a style.  Then, when we eventually print the
operand we would loop over the parts and print each part with the
correct style.

But it feels like a huge amount of work to move from where we are
now to that potentially ideal solution.  Plus, the above solution
would be pretty complex.

So, instead I propose a .... different solution here, one that works
with the existing infrastructure.

As each operand is built up, piece be piece, we pass through style
information.  This style information is then encoded into the operand
buffer (see below for details).  After this the code can continue to
operate as it does right now in order to manage the set of operand
buffers.

Then, as each operand is printed we can split the operand buffer into
chunks at the style marker boundaries, with each chunk being printed
with the correct style.

For encoding the style information I use a single character, currently
\002, followed by the style encoded as a single hex digit, followed
again by the \002 character.

This of course relies on there not being more than 16 styles, but that
is currently true, and hopefully will remain true for the foreseeable
future.

The other major concern that has arisen around this work is whether
the escape character could ever be encountered in output naturally
generated by the disassembler.  If this did happen then the escape
characters would be stripped from the output, and the wrong styling
would be applied.

However, I don't believe that this is currently a problem.
Disassembler content comes from a number of sources.  First there's
content that copied directly from the i386-dis.c file, this is things
like register names, and other syntax elements (brackets, commas,
etc).  We can easily check that the i386-dis.c file doesn't contain
our special character.

The next source of content are immediate operands.  The text for these
operands is generated by calls into libc.  By selecting a
non-printable character we can be confident that this is not something
that libc will generate as part of an immediate representation.

The other output that appears to be from the disassembler is operands
that contain addresses and (possibly) symbol names.  It is quite
possible that a symbol name might contain any special character we
could imagine, so is this a problem?

I don't think it is, we don't actually print address and symbol
operands through the disassembler, instead, the disassembler calls
back to the user (objdump, gdb, etc) to print the address and symbol
on its behalf.  This content is printed directly to the output stream,
it does not pass through the i386 disassembler output buffers.  As a
result, we never check this particular output for styling escape
characters.

In some (not very scientific) benchmarking on my machine,
disassembling a reasonably large (142M) shared library, I'm not seeing
any significant slow down in disassembler speed with this change.

Most instructions are now being fully syntax highlighted when I
disassemble using the --disassembler-color=extended-color option.  I'm
sure that there are probably still a few corner cases that need fixing
up, but we can come back to them later I think.

When disassembler syntax highlighting is not being used, then there
should be no user visible changes after this commit.
---
 opcodes/i386-dis.c | 405 +++++++++++++++++++++++++++++++--------------
 1 file changed, 278 insertions(+), 127 deletions(-)

diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 6ef091ea7d7..28834e4650b 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -47,6 +47,8 @@ static void dofloat (instr_info *, int);
 static void OP_ST (instr_info *, int, int);
 static void OP_STi (instr_info *, int, int);
 static int putop (instr_info *, const char *, int);
+static void oappend_with_style (instr_info *, const char *,
+				enum disassembler_style);
 static void oappend (instr_info *, const char *);
 static void append_seg (instr_info *);
 static void OP_indirE (instr_info *, int, int);
@@ -116,6 +118,10 @@ static void FXSAVE_Fixup (instr_info *, int, int);
 static void MOVSXD_Fixup (instr_info *, int, int);
 static void DistinctDest_Fixup (instr_info *, int, int);
 
+/* This character is used to encode style information within the output
+   buffers.  See oappend_insert_style for more details.  */
+#define STYLE_MARKER_CHAR '\002'
+
 struct dis_private {
   /* Points to first byte not fetched.  */
   bfd_byte *max_fetched;
@@ -248,6 +254,8 @@ struct instr_info
 
   enum x86_64_isa isa64;
 
+  int (*printf) (instr_info *ins, enum disassembler_style style,
+		 const char *fmt, ...) ATTRIBUTE_FPTR_PRINTF_3;
 };
 
 /* Mark parts used in the REX prefix.  When we are testing for
@@ -9298,11 +9306,103 @@ get_sib (instr_info *ins, int sizeflag)
 }
 
 /* Like oappend (below), but S is a string starting with '%'.
-   In Intel syntax, the '%' is elided.  */
+   In Intel syntax, the '%' is elided.  STYLE is used when displaying this
+   part of the output in the disassembler.  */
+
+static void
+oappend_maybe_intel_with_style (instr_info *ins, const char *s,
+				enum disassembler_style style)
+{
+  oappend_with_style (ins, s + ins->intel_syntax, style);
+}
+
+/* Like oappend_maybe_intel_with_style, but always uses text style.  */
+
 static void
 oappend_maybe_intel (instr_info *ins, const char *s)
 {
-  oappend (ins, s + ins->intel_syntax);
+  oappend_maybe_intel_with_style (ins, s, dis_style_text);
+}
+
+/* Wrap around a call to INS->info->fprintf_styled_func, printing FMT.
+   STYLE is the default style to use in the fprintf_styled_func calls,
+   however, FMT might include embedded style markers (see oappend_style),
+   these embedded markers are not printed, but instead change the style
+   used in the next fprintf_styled_func call.
+
+   Return non-zero to indicate the print call was a success.  */
+
+static int ATTRIBUTE_PRINTF_3
+i386_dis_printf (instr_info *ins, enum disassembler_style style,
+		 const char *fmt, ...)
+{
+  va_list ap;
+  enum disassembler_style curr_style = style;
+  char *start, *curr;
+  char staging_area[100];
+  int res;
+
+  va_start (ap, fmt);
+  res = vsnprintf (staging_area, sizeof (staging_area), fmt, ap);
+  va_end (ap);
+
+  if (res < 0)
+    return res;
+
+  start = curr = staging_area;
+
+  do
+    {
+      if (*curr == '\0'
+	  || (*curr == STYLE_MARKER_CHAR
+	      && ISXDIGIT (*(curr + 1))
+	      && *(curr + 2) == STYLE_MARKER_CHAR))
+	{
+	  /* Output content between our START position and CURR.  */
+	  int len = curr - start;
+	  int n = (*ins->info->fprintf_styled_func) (ins->info->stream,
+						     curr_style,
+						     "%.*s", len, start);
+	  if (n < 0)
+	    {
+	      res = n;
+	      break;
+	    }
+
+	  if (*curr == '\0')
+	    break;
+
+	  /* Skip over the initial STYLE_MARKER_CHAR.  */
+	  ++curr;
+
+	  /* Update the CURR_STYLE.  As there are less than 16 styles, it
+	     is possible, that if the input is corrupted in some way, that
+	     we might set CURR_STYLE to an invalid value.  Don't worry
+	     though, we check for this situation.  */
+	  if (*curr >= '0' && *curr <= '9')
+	    curr_style = (enum disassembler_style) (*curr - '0');
+	  else if (*curr >= 'a' && *curr <= 'f')
+	    curr_style = (enum disassembler_style) (*curr - 'a' + 10);
+	  else
+	    curr_style = dis_style_text;
+
+	  /* Check for an invalid style having been selected.  This should
+	     never happen, but it doesn't hurt to be a little paranoid.  */
+	  if (curr_style > dis_style_comment_start)
+	    curr_style = dis_style_text;
+
+	  /* Skip the hex character, and the closing STYLE_MARKER_CHAR.  */
+	  curr += 2;
+
+	  /* Reset the START to after the style marker.  */
+	  start = curr;
+	}
+      else
+	++curr;
+    }
+  while (true);
+
+  return res;
 }
 
 static int
@@ -9317,6 +9417,7 @@ print_insn (bfd_vma pc, instr_info *ins)
   struct dis_private priv;
   int prefix_length;
 
+  ins->printf = i386_dis_printf;
   ins->isa64 = 0;
   ins->intel_mnemonic = !SYSV386_COMPAT;
   ins->op_is_jump = false;
@@ -9401,8 +9502,7 @@ print_insn (bfd_vma pc, instr_info *ins)
 
   if (ins->address_mode == mode_64bit && sizeof (bfd_vma) < 8)
     {
-      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
-					 _("64-bit address is disabled"));
+      ins->printf (ins, dis_style_text, _("64-bit address is disabled"));
       return -1;
     }
 
@@ -9451,16 +9551,14 @@ print_insn (bfd_vma pc, instr_info *ins)
 	{
 	  name = prefix_name (ins, priv.the_buffer[0], priv.orig_sizeflag);
 	  if (name != NULL)
-	    (*ins->info->fprintf_styled_func)
-	      (ins->info->stream, dis_style_mnemonic, "%s", name);
+	    ins->printf (ins, dis_style_mnemonic, "%s", name);
 	  else
 	    {
 	      /* Just print the first byte as a .byte instruction.  */
-	      (*ins->info->fprintf_styled_func)
-		(ins->info->stream, dis_style_assembler_directive, ".byte ");
-	      (*ins->info->fprintf_styled_func)
-		(ins->info->stream, dis_style_immediate, "0x%x",
-		 (unsigned int) priv.the_buffer[0]);
+	      ins->printf (ins, dis_style_assembler_directive,
+			   ".byte ");
+	      ins->printf (ins, dis_style_immediate, "0x%x",
+			   (unsigned int) priv.the_buffer[0]);
 	    }
 
 	  return 1;
@@ -9478,10 +9576,9 @@ print_insn (bfd_vma pc, instr_info *ins)
       for (i = 0;
 	   i < (int) ARRAY_SIZE (ins->all_prefixes) && ins->all_prefixes[i];
 	   i++)
-	(*ins->info->fprintf_styled_func)
-	  (ins->info->stream, dis_style_mnemonic, "%s%s",
-	   (i == 0 ? "" : " "), prefix_name (ins, ins->all_prefixes[i],
-					     sizeflag));
+	ins->printf (ins, dis_style_mnemonic, "%s%s",
+		     (i == 0 ? "" : " "),
+		     prefix_name (ins, ins->all_prefixes[i], sizeflag));
       return i;
     }
 
@@ -9496,11 +9593,9 @@ print_insn (bfd_vma pc, instr_info *ins)
       /* Handle ins->prefixes before fwait.  */
       for (i = 0; i < ins->fwait_prefix && ins->all_prefixes[i];
 	   i++)
-	(*ins->info->fprintf_styled_func)
-	  (ins->info->stream, dis_style_mnemonic, "%s ",
-	   prefix_name (ins, ins->all_prefixes[i], sizeflag));
-      (*ins->info->fprintf_styled_func)
-	(ins->info->stream, dis_style_mnemonic, "fwait");
+	ins->printf (ins, dis_style_mnemonic, "%s ",
+		     prefix_name (ins, ins->all_prefixes[i], sizeflag));
+      ins->printf (ins, dis_style_mnemonic, "fwait");
       return i + 1;
     }
 
@@ -9649,16 +9744,14 @@ print_insn (bfd_vma pc, instr_info *ins)
      are all 0s in inverted form.  */
   if (ins->need_vex && ins->vex.register_specifier != 0)
     {
-      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
-					 "(bad)");
+      ins->printf (ins, dis_style_text, "(bad)");
       return ins->end_codep - priv.the_buffer;
     }
 
   /* If EVEX.z is set, there must be an actual mask register in use.  */
   if (ins->vex.zeroing && ins->vex.mask_register_specifier == 0)
     {
-      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
-					 "(bad)");
+      ins->printf (ins, dis_style_text, "(bad)");
       return ins->end_codep - priv.the_buffer;
     }
 
@@ -9669,8 +9762,7 @@ print_insn (bfd_vma pc, instr_info *ins)
 	 the encoding invalid.  Most other PREFIX_OPCODE rules still apply.  */
       if (ins->need_vex ? !ins->vex.prefix : !(ins->prefixes & PREFIX_DATA))
 	{
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, "(bad)");
+	  ins->printf (ins, dis_style_text, "(bad)");
 	  return ins->end_codep - priv.the_buffer;
 	}
       ins->used_prefixes |= PREFIX_DATA;
@@ -9697,8 +9789,7 @@ print_insn (bfd_vma pc, instr_info *ins)
 	  || (ins->vex.evex && dp->prefix_requirement != PREFIX_DATA
 	      && !ins->vex.w != !(ins->used_prefixes & PREFIX_DATA)))
 	{
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, "(bad)");
+	  ins->printf (ins, dis_style_text, "(bad)");
 	  return ins->end_codep - priv.the_buffer;
 	}
       break;
@@ -9748,24 +9839,28 @@ print_insn (bfd_vma pc, instr_info *ins)
 	if (name == NULL)
 	  abort ();
 	prefix_length += strlen (name) + 1;
-	(*ins->info->fprintf_styled_func)
-	  (ins->info->stream, dis_style_mnemonic, "%s ", name);
+	ins->printf (ins, dis_style_mnemonic, "%s ", name);
       }
 
   /* Check maximum code length.  */
   if ((ins->codep - ins->start_codep) > MAX_CODE_LENGTH)
     {
-      (*ins->info->fprintf_styled_func)
-	(ins->info->stream, dis_style_text, "(bad)");
+      ins->printf (ins, dis_style_text, "(bad)");
       return MAX_CODE_LENGTH;
     }
 
-  ins->obufp = ins->mnemonicendp;
-  for (i = strlen (ins->obuf) + prefix_length; i < 6; i++)
-    oappend (ins, " ");
-  oappend (ins, " ");
-  (*ins->info->fprintf_styled_func)
-    (ins->info->stream, dis_style_mnemonic, "%s", ins->obuf);
+  i = strlen (ins->obuf);
+  if (ins->mnemonicendp == ins->obuf + i)
+    {
+      i += prefix_length;
+      if (i < 6)
+	i = 6 - i + 1;
+      else
+	i = 1;
+    }
+  else
+    i = 0;
+  ins->printf (ins, dis_style_mnemonic, "%s%*s", ins->obuf, i, "");
 
   /* The enter and bound instructions are printed with operands in the same
      order as the intel book; everything else is printed in reverse order.  */
@@ -9804,8 +9899,7 @@ print_insn (bfd_vma pc, instr_info *ins)
     if (*op_txt[i])
       {
 	if (needcomma)
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, ",");
+	  ins->printf (ins, dis_style_text, ",");
 	if (ins->op_index[i] != -1 && !ins->op_riprel[i])
 	  {
 	    bfd_vma target = (bfd_vma) ins->op_address[ins->op_index[i]];
@@ -9821,18 +9915,14 @@ print_insn (bfd_vma pc, instr_info *ins)
 	    (*ins->info->print_address_func) (target, ins->info);
 	  }
 	else
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, "%s",
-					     op_txt[i]);
+	  ins->printf (ins, dis_style_text, "%s", op_txt[i]);
 	needcomma = 1;
       }
 
   for (i = 0; i < MAX_OPERANDS; i++)
     if (ins->op_index[i] != -1 && ins->op_riprel[i])
       {
-	(*ins->info->fprintf_styled_func) (ins->info->stream,
-					   dis_style_comment_start,
-					   "        # ");
+	ins->printf (ins, dis_style_comment_start, "        # ");
 	(*ins->info->print_address_func) ((bfd_vma)
 			(ins->start_pc + (ins->codep - ins->start_codep)
 			 + ins->op_address[ins->op_index[i]]), ins->info);
@@ -10224,8 +10314,11 @@ static void
 OP_STi (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
 	int sizeflag ATTRIBUTE_UNUSED)
 {
-  sprintf (ins->scratchbuf, "%%st(%d)", ins->modrm.rm);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_maybe_intel (ins, "%st");
+  oappend (ins, "(");
+  sprintf (ins->scratchbuf, "%d", ins->modrm.rm);
+  oappend_with_style (ins, ins->scratchbuf, dis_style_immediate);
+  oappend (ins, ")");
 }
 
 /* Capital letters in template are macros.  */
@@ -10772,12 +10865,64 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
   return 0;
 }
 
+/* Add a style marker to *INS->obufp that encodes STYLE.  This assumes that
+   the buffer pointed to by INS->obufp has space.  A style marker is made
+   from the STYLE_MARKER_CHAR followed by STYLE converted to a single hex
+   digit, followed by another STYLE_MARKER_CHAR.  This function assumes
+   that the number of styles is not greater than 16.  */
+
 static void
-oappend (instr_info *ins, const char *s)
+oappend_insert_style (instr_info *ins, enum disassembler_style style)
 {
+  int num = (int) style;
+
+  /* We currently assume that STYLE can be encoded as a single hex
+     character.  If more styles are added then this might start to fail,
+     and we'll need to expand this code.  */
+  if (num > 0xf)
+    abort ();
+
+  *ins->obufp++ = STYLE_MARKER_CHAR;
+  *ins->obufp++ = (num < 10 ? ('0' + num)
+		   : ((num < 16) ? ('a' + (num - 10)) : '0'));
+  *ins->obufp++ = STYLE_MARKER_CHAR;
+
+  /* This final null character is not strictly necessary, after inserting a
+     style marker we should always be inserting some additional content.
+     However, having the buffer null terminated doesn't cost much, and make
+     it easier to debug what's going on.  Also, if we do ever forget to add
+     any additional content after this style marker, then the buffer will
+     still be well formed.  */
+  *ins->obufp = '\0';
+}
+
+static void
+oappend_with_style (instr_info *ins, const char *s,
+		    enum disassembler_style style)
+{
+  oappend_insert_style (ins, style);
   ins->obufp = stpcpy (ins->obufp, s);
 }
 
+/* Like oappend_with_style but always with text style.  */
+
+static void
+oappend (instr_info *ins, const char *s)
+{
+  oappend_with_style (ins, s, dis_style_text);
+}
+
+/* Add a single character C to the buffer pointer to by INS->obufp, marking
+   the style for the character as STYLE.  */
+
+static void
+oappend_char (instr_info *ins, const char c, enum disassembler_style style)
+{
+  oappend_insert_style (ins, style);
+  *ins->obufp++ = c;
+  *ins->obufp = '\0';
+}
+
 static void
 append_seg (instr_info *ins)
 {
@@ -10789,26 +10934,27 @@ append_seg (instr_info *ins)
   switch (ins->active_seg_prefix)
     {
     case PREFIX_CS:
-      oappend_maybe_intel (ins, "%cs:");
+      oappend_maybe_intel_with_style (ins, "%cs", dis_style_register);
       break;
     case PREFIX_DS:
-      oappend_maybe_intel (ins, "%ds:");
+      oappend_maybe_intel_with_style (ins, "%ds", dis_style_register);
       break;
     case PREFIX_SS:
-      oappend_maybe_intel (ins, "%ss:");
+      oappend_maybe_intel_with_style (ins, "%ss", dis_style_register);
       break;
     case PREFIX_ES:
-      oappend_maybe_intel (ins, "%es:");
+      oappend_maybe_intel_with_style (ins, "%es", dis_style_register);
       break;
     case PREFIX_FS:
-      oappend_maybe_intel (ins, "%fs:");
+      oappend_maybe_intel_with_style (ins, "%fs", dis_style_register);
       break;
     case PREFIX_GS:
-      oappend_maybe_intel (ins, "%gs:");
+      oappend_maybe_intel_with_style (ins, "%gs", dis_style_register);
       break;
     default:
       break;
     }
+  oappend_char (ins, ':', dis_style_text);
 }
 
 static void
@@ -11296,7 +11442,7 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
       oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
       return;
     }
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_maybe_intel_with_style (ins, names[reg], dis_style_register);
 }
 
 static void
@@ -11560,11 +11706,15 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	      print_displacement (ins, ins->scratchbuf, disp);
 	    else
 	      print_operand_value (ins, ins->scratchbuf, 1, disp);
-	    oappend (ins, ins->scratchbuf);
+	    oappend_with_style (ins, ins->scratchbuf,
+				dis_style_address_offset);
 	    if (riprel)
 	      {
 		set_op (ins, disp, 1);
-		oappend (ins, !addr32flag ? "(%rip)" : "(%eip)");
+		oappend_char (ins, '(', dis_style_text);
+		oappend_with_style (ins, !addr32flag ? "%rip" : "%eip",
+				    dis_style_register);
+		oappend_char (ins, ')', dis_style_text);
 	      }
 	  }
 
@@ -11578,17 +11728,19 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
       if (havedisp || (ins->intel_syntax && riprel))
 	{
-	  *ins->obufp++ = ins->open_char;
+	  oappend_char (ins, ins->open_char, dis_style_text);
 	  if (ins->intel_syntax && riprel)
 	    {
 	      set_op (ins, disp, 1);
-	      oappend (ins, !addr32flag ? "rip" : "eip");
+	      oappend_with_style (ins, !addr32flag ? "rip" : "eip",
+				  dis_style_register);
 	    }
-	  *ins->obufp = '\0';
 	  if (havebase)
-	    oappend_maybe_intel (ins,
-				 (ins->address_mode == mode_64bit && !addr32flag
-				  ? att_names64 : att_names32)[rbase]);
+	    oappend_maybe_intel_with_style
+	      (ins,
+	       (ins->address_mode == mode_64bit && !addr32flag
+		? att_names64 : att_names32)[rbase],
+	       dis_style_register);
 	  if (ins->has_sib)
 	    {
 	      /* ESP/RSP won't allow index.  If base isn't ESP/RSP,
@@ -11599,14 +11751,12 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 		  || (havebase && base != ESP_REG_NUM))
 		{
 		  if (!ins->intel_syntax || havebase)
-		    {
-		      *ins->obufp++ = ins->separator_char;
-		      *ins->obufp = '\0';
-		    }
+		    oappend_char (ins, ins->separator_char, dis_style_text);
 		  if (indexes)
 		    {
 		      if (ins->address_mode == mode_64bit || vindex < 16)
-			oappend_maybe_intel (ins, indexes[vindex]);
+			oappend_maybe_intel_with_style (ins, indexes[vindex],
+							dis_style_register);
 		      else
 			oappend (ins, "(bad)");
 		    }
@@ -11614,26 +11764,22 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 		    oappend_maybe_intel (ins,
 					 ins->address_mode == mode_64bit
 					 && !addr32flag ? att_index64
-							: att_index32);
+					 : att_index32);
 
-		  *ins->obufp++ = ins->scale_char;
-		  *ins->obufp = '\0';
+		  oappend_char (ins, ins->scale_char, dis_style_text);
 		  sprintf (ins->scratchbuf, "%d", 1 << scale);
-		  oappend (ins, ins->scratchbuf);
+		  oappend_with_style (ins, ins->scratchbuf,
+				      dis_style_immediate);
 		}
 	    }
 	  if (ins->intel_syntax
 	      && (disp || ins->modrm.mod != 0 || base == 5))
 	    {
 	      if (!havedisp || (bfd_signed_vma) disp >= 0)
-		{
-		  *ins->obufp++ = '+';
-		  *ins->obufp = '\0';
-		}
+		  oappend_char (ins, '+', dis_style_text);
 	      else if (ins->modrm.mod != 1 && disp != -disp)
 		{
-		  *ins->obufp++ = '-';
-		  *ins->obufp = '\0';
+		  oappend_char (ins, '-', dis_style_text);
 		  disp = -disp;
 		}
 
@@ -11644,8 +11790,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	      oappend (ins, ins->scratchbuf);
 	    }
 
-	  *ins->obufp++ = ins->close_char;
-	  *ins->obufp = '\0';
+	  oappend_char (ins, ins->close_char, dis_style_text);
 
 	  if (check_gather)
 	    {
@@ -11666,7 +11811,8 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	    {
 	      if (!ins->active_seg_prefix)
 		{
-		  oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
+		  oappend_maybe_intel_with_style
+		    (ins, att_names_seg[ds_reg - es_reg], dis_style_register);
 		  oappend (ins, ":");
 		}
 	      print_operand_value (ins, ins->scratchbuf, 1, disp);
@@ -11722,23 +11868,17 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
       if (ins->modrm.mod != 0 || ins->modrm.rm != 6)
 	{
-	  *ins->obufp++ = ins->open_char;
-	  *ins->obufp = '\0';
-	  oappend (ins,
-		   (ins->intel_syntax ? intel_index16
-				      : att_index16)[ins->modrm.rm]);
+	  oappend_char (ins, ins->open_char, dis_style_text);
+	  oappend (ins, (ins->intel_syntax ? intel_index16
+			 : att_index16)[ins->modrm.rm]);
 	  if (ins->intel_syntax
 	      && (disp || ins->modrm.mod != 0 || ins->modrm.rm == 6))
 	    {
 	      if ((bfd_signed_vma) disp >= 0)
-		{
-		  *ins->obufp++ = '+';
-		  *ins->obufp = '\0';
-		}
+		oappend_char (ins, '+', dis_style_text);
 	      else if (ins->modrm.mod != 1)
 		{
-		  *ins->obufp++ = '-';
-		  *ins->obufp = '\0';
+		  oappend_char (ins, '-', dis_style_text);
 		  disp = -disp;
 		}
 
@@ -11746,14 +11886,14 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	      oappend (ins, ins->scratchbuf);
 	    }
 
-	  *ins->obufp++ = ins->close_char;
-	  *ins->obufp = '\0';
+	  oappend_char (ins, ins->close_char, dis_style_text);
 	}
       else if (ins->intel_syntax)
 	{
 	  if (!ins->active_seg_prefix)
 	    {
-	      oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
+	      oappend_maybe_intel_with_style
+		(ins, att_names_seg[ds_reg - es_reg], dis_style_register);
 	      oappend (ins, ":");
 	    }
 	  print_operand_value (ins, ins->scratchbuf, 1, disp & 0xffff);
@@ -11969,7 +12109,8 @@ OP_REG (instr_info *ins, int code, int sizeflag)
     {
     case es_reg: case ss_reg: case cs_reg:
     case ds_reg: case fs_reg: case gs_reg:
-      oappend_maybe_intel (ins, att_names_seg[code - es_reg]);
+      oappend_maybe_intel_with_style
+	(ins, att_names_seg[code - es_reg], dis_style_register);
       return;
     }
 
@@ -12022,7 +12163,7 @@ OP_REG (instr_info *ins, int code, int sizeflag)
       oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
       return;
     }
-  oappend_maybe_intel (ins, s);
+  oappend_maybe_intel_with_style (ins, s, dis_style_register);
 }
 
 static void
@@ -12063,7 +12204,7 @@ OP_IMREG (instr_info *ins, int code, int sizeflag)
       oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
       return;
     }
-  oappend_maybe_intel (ins, s);
+  oappend_maybe_intel_with_style (ins, s, dis_style_register);
 }
 
 static void
@@ -12118,7 +12259,7 @@ OP_I (instr_info *ins, int bytemode, int sizeflag)
   op &= mask;
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, op);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_immediate);
   ins->scratchbuf[0] = '\0';
 }
 
@@ -12136,7 +12277,7 @@ OP_I64 (instr_info *ins, int bytemode, int sizeflag)
 
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, get64 (ins));
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_immediate);
   ins->scratchbuf[0] = '\0';
 }
 
@@ -12190,7 +12331,7 @@ OP_sI (instr_info *ins, int bytemode, int sizeflag)
 
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, op);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_immediate);
 }
 
 static void
@@ -12248,7 +12389,8 @@ static void
 OP_SEG (instr_info *ins, int bytemode, int sizeflag)
 {
   if (bytemode == w_mode)
-    oappend_maybe_intel (ins, att_names_seg[ins->modrm.reg]);
+    oappend_maybe_intel_with_style
+      (ins, att_names_seg[ins->modrm.reg], dis_style_register);
   else
     OP_E (ins, ins->modrm.mod == 3 ? bytemode : w_mode, sizeflag);
 }
@@ -12294,12 +12436,13 @@ OP_OFF (instr_info *ins, int bytemode, int sizeflag)
     {
       if (!ins->active_seg_prefix)
 	{
-	  oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
+	  oappend_maybe_intel_with_style (ins, att_names_seg[ds_reg - es_reg],
+					  dis_style_register);
 	  oappend (ins, ":");
 	}
     }
   print_operand_value (ins, ins->scratchbuf, 1, off);
-  oappend (ins, ins->scratchbuf);
+  oappend_with_style (ins, ins->scratchbuf, dis_style_address_offset);
 }
 
 static void
@@ -12324,12 +12467,14 @@ OP_OFF64 (instr_info *ins, int bytemode, int sizeflag)
     {
       if (!ins->active_seg_prefix)
 	{
-	  oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
+	  oappend_maybe_intel_with_style (ins,
+					  att_names_seg[ds_reg - es_reg],
+					  dis_style_register);
 	  oappend (ins, ":");
 	}
     }
   print_operand_value (ins, ins->scratchbuf, 1, off);
-  oappend (ins, ins->scratchbuf);
+  oappend_with_style (ins, ins->scratchbuf, dis_style_address_offset);
 }
 
 static void
@@ -12350,9 +12495,8 @@ ptr_reg (instr_info *ins, int code, int sizeflag)
     s = att_names32[code - eAX_reg];
   else
     s = att_names16[code - eAX_reg];
-  oappend_maybe_intel (ins, s);
-  *ins->obufp++ = ins->close_char;
-  *ins->obufp = 0;
+  oappend_maybe_intel_with_style (ins, s, dis_style_register);
+  oappend_char (ins, ins->close_char, dis_style_text);
 }
 
 static void
@@ -12375,7 +12519,8 @@ OP_ESreg (instr_info *ins, int code, int sizeflag)
 	  intel_operand_size (ins, b_mode, sizeflag);
 	}
     }
-  oappend_maybe_intel (ins, "%es:");
+  oappend_maybe_intel_with_style (ins, "%es", dis_style_register);
+  oappend_char (ins, ':', dis_style_text);
   ptr_reg (ins, code, sizeflag);
 }
 
@@ -12470,7 +12615,7 @@ OP_MMX (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
     }
   else
     names = att_names_mm;
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_maybe_intel_with_style (ins, names[reg], dis_style_register);
 }
 
 static void
@@ -12543,7 +12688,7 @@ print_vector_reg (instr_info *ins, unsigned int reg, int bytemode)
     }
   else
     names = att_names_xmm;
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_maybe_intel_with_style (ins, names[reg], dis_style_register);
 }
 
 static void
@@ -12603,7 +12748,7 @@ OP_EM (instr_info *ins, int bytemode, int sizeflag)
     }
   else
     names = att_names_mm;
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_maybe_intel_with_style (ins, names[reg], dis_style_register);
 }
 
 /* cvt* are the only instructions in sse2 which have
@@ -12629,7 +12774,8 @@ OP_EMC (instr_info *ins, int bytemode, int sizeflag)
   MODRM_CHECK;
   ins->codep++;
   ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
-  oappend_maybe_intel (ins, att_names_mm[ins->modrm.rm]);
+  oappend_maybe_intel_with_style (ins, att_names_mm[ins->modrm.rm],
+				  dis_style_register);
 }
 
 static void
@@ -12813,7 +12959,7 @@ OP_3DNowSuffix (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
   ins->obufp = ins->mnemonicendp;
   mnemonic = Suffix3DNow[*ins->codep++ & 0xff];
   if (mnemonic)
-    oappend (ins, mnemonic);
+    ins->obufp = stpcpy (ins->obufp, mnemonic);
   else
     {
       /* Since a variable sized ins->modrm/ins->sib chunk is between the start
@@ -12959,7 +13105,7 @@ BadOp (instr_info *ins)
 {
   /* Throw away prefixes and 1st. opcode byte.  */
   ins->codep = ins->insn_codep + 1;
-  oappend (ins, "(bad)");
+  ins->obufp = stpcpy (ins->obufp, "(bad)");
 }
 
 static void
@@ -13172,7 +13318,8 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
   switch (bytemode)
     {
     case scalar_mode:
-      oappend_maybe_intel (ins, att_names_xmm[reg]);
+      oappend_maybe_intel_with_style (ins, att_names_xmm[reg],
+				      dis_style_register);
       return;
 
     case vex_vsib_d_w_dq_mode:
@@ -13183,9 +13330,11 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       if (ins->vex.length == 128
 	  || (bytemode != vex_vsib_d_w_dq_mode
 	      && !ins->vex.w))
-	oappend_maybe_intel (ins, att_names_xmm[reg]);
+	oappend_maybe_intel_with_style (ins, att_names_xmm[reg],
+					dis_style_register);
       else
-	oappend_maybe_intel (ins, att_names_ymm[reg]);
+	oappend_maybe_intel_with_style (ins, att_names_ymm[reg],
+					dis_style_register);
 
       /* All 3 XMM/YMM registers must be distinct.  */
       modrm_reg = ins->modrm.reg;
@@ -13217,7 +13366,8 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	  /* This must be the 3rd operand.  */
 	  if (ins->obufp != ins->op_out[2])
 	    abort ();
-	  oappend_maybe_intel (ins, att_names_tmm[reg]);
+	  oappend_maybe_intel_with_style (ins, att_names_tmm[reg],
+					  dis_style_register);
 	  if (reg == ins->modrm.reg || reg == ins->modrm.rm)
 	    strcpy (ins->obufp, "/(bad)");
 	}
@@ -13292,7 +13442,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       abort ();
       break;
     }
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_maybe_intel_with_style (ins, names[reg], dis_style_register);
 }
 
 static void
@@ -13335,7 +13485,7 @@ OP_REG_VexI4 (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
   if (bytemode == x_mode && ins->vex.length == 256)
     names = att_names_ymm;
 
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_maybe_intel_with_style (ins, names[reg], dis_style_register);
 
   if (ins->vex.w)
     {
@@ -13352,7 +13502,7 @@ OP_VexI4 (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
 {
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, ins->codep[-1] & 0xf);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_text);
 }
 
 static void
@@ -13397,7 +13547,7 @@ VPCMP_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
       /* We have a reserved extension byte.  Output it directly.  */
       ins->scratchbuf[0] = '$';
       print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
-      oappend_maybe_intel (ins, ins->scratchbuf);
+      oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_text);
       ins->scratchbuf[0] = '\0';
     }
 }
@@ -13449,7 +13599,7 @@ VPCOM_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
       /* We have a reserved extension byte.  Output it directly.  */
       ins->scratchbuf[0] = '$';
       print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
-      oappend_maybe_intel (ins, ins->scratchbuf);
+      oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_text);
       ins->scratchbuf[0] = '\0';
     }
 }
@@ -13497,7 +13647,8 @@ PCLMUL_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
       /* We have a reserved extension byte.  Output it directly.  */
       ins->scratchbuf[0] = '$';
       print_operand_value (ins, ins->scratchbuf + 1, 1, pclmul_type);
-      oappend_maybe_intel (ins, ins->scratchbuf);
+      oappend_maybe_intel_with_style (ins, ins->scratchbuf,
+				      dis_style_immediate);
       ins->scratchbuf[0] = '\0';
     }
 }
-- 
2.25.4


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
  2022-05-09  9:48         ` Andrew Burgess
  2022-05-09 12:54           ` [PATCHv2] " Andrew Burgess
@ 2022-05-18  7:06           ` Jan Beulich
  2022-05-18 10:41             ` Andrew Burgess
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2022-05-18  7:06 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: binutils

On 09.05.2022 11:48, Andrew Burgess wrote:
> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>> On 03.05.2022 15:12, Andrew Burgess wrote:
>>> I'm still keen to try and go with an escaping based solution, my
>>> reasoning is that I think that this is the solution least likely to
>>> introduce latent disassembler bugs.
>>>
>>> However, that position is based on my belief that there's no exhaustive
>>> test for the i386 based disassembler, i.e. one that tests every single
>>> valid instruction disassembles correctly.  If there was such a test then
>>> I might be more tempted to try something more radical...
>>>
>>> That said, if I was going to stick with an escaping scheme, then I have
>>> some ideas for moving forward.
>>>
>>> The current scheme relies on the fact that symbols are not printed
>>> directly from the i386 disassembler, instead the i386 disassembler calls
>>> back into the driver application (objdump, gdb) to print the symbol.  As
>>> a result, symbols don't go through the instr_info::obuf buffer.  This
>>> means that we never try to interpret a symbol name for escape
>>> characters.
>>
>> Hmm, indeed. I have to admit that I view it as a significant shortcoming
>> of the disassembler that it doesn't resolve addresses in the output. So
>> I'd like to at least not see the road being closed towards improving this.
>>
>>> This means we avoid one of the issues that you raised, what if the
>>> escape character appears in a symbol name; the answer is, I just don't
>>> need to worry about this!
>>>
>>> So, I only need to ensure that the escape character is:
>>>
>>>   (a) not a character that the disassembler currently tries to directly
>>>   print itself, and
>>>
>>>   (b) not something that will ever be printed as part of an immediate.
>>
>> Or, more generally, as part of any kind of operand.
> 
> Sure, but the reason I single out immedates here is I think these are
> the only operand whose content is not statically know within the
> disassembler.
> 
> For example, register operands, every possible register operand value is
> enumerated within the i386-dis.c source file, right?  So when I proposed
> using '~' I could simply search the source file, find no uses, and know
> that character is not (currently) used within a register name.

Indeed. Yet present state is only part of it. See the uses of { and }
that AVX512 has added. Prior to that one could have thought these
characters could easily be used for some special purpose (like your
escaping), too. Hence my pointing out of possible future uses of ~,
with the more general implication that all printable characters would
better be avoided. But you've switched to \002 already anyway afaics.

> Immediates are different though, for them we rely on libc to generate
> the textual representation.

Yet even then we know the set of characters libc might use.

> The only other operand type that might contain "unknown" characters
> would be a field that contains an address and potentially a symbol name,
> but as was already discussed, these are not printed through the
> disassembler.

Hmm, yes. This behavior is so extremely counterintuitive to me that
I keep forgetting. Not the least because in many cases a symbol
name isn't printed at all even when one could be known. So yes, if
->print_address_func() doesn't look for escapes, then indeed all
should be fine right now.

> My question then, other than the exceptions I've already listed, are
> there other types of operand where the content doesn't already exit
> within i386-dis.c?

I don't think there is right now.

>>> Clearly my choice passes both right now, but looks like it will not pass
>>> (b) forever.
>>>
>>> One possible solution would be to replace all the remaining places where
>>> we directly write to instr_info::obuf with calls to oappend_char.
>>
>> I guess this might be troublesome. The way the disassembler works is a
>> little quirky here and there, and hence one needs to play tricks every
>> now and then to half-way reasonably deal with certain special cases.
>>
>>>  I
>>> could then extend the oappend API such that we do "real" escaping, that
>>> is (assuming the continued use of '~' for now): '~X' would indicate a
>>> style marker, with X being the style number, and '~~' would indicate a
>>> literal '~' character.  In this was we really wouldn't care which
>>> character we used (though we'd probably pick one that didn't crop up too
>>> ofter just for ease of parsing the buffers).
>>>
>>> An alternative solution would be to pick a non-printable character,
>>> e.g. \001, and use this as the escape character in place of the current
>>> '~'.  This seems to pass the (a) and (b) tests above, and if such a
>>> character does ever appear in a symbol name, then, as I've said above, I
>>> don't believe this would cause us any problems.
>>
>> I suppose \001 (or a character very close to this, as iirc \001 has
>> some meaning internally in gas, and I'm not entirely certain none of
>> these uses can ever "escape" gas) is good to start with. Provided it
>> is properly abstracted so it can, if necessary, be _very_ easily
>> changed (by modifying exactly one line, or - if you need both a
>> single-quoted and a double-quoted instance - two adjacent ones).
>>
>> Albeit, thinking of this last aspect, maybe it would be better to
>> only have a double-quoted instance in the first place, and allow
>> for the escape to be more than a single character if need be ...
>>
>> And yes - if a symbol name was possible to hit and if that symbol
>> name contained such an escape sequence, aiui the worst that would
>> happen is bogus coloring? IOW the escape would not be looked for and
>> replaced / processed when coloring is disabled?
> 
> Unfortunately this is not correct.  The disassembler always sends
> styling information to the user (objdump, gdb, etc), its the user that
> decides if the output should be styled or not.
> 
> What this means is that if the disassembler encountered a random symbol
> (which would be a pretty big change to the disassembler), and the symbol
> did include something like ~a~ (using the current character to make it
> more readable here), then the whole '~a~' part would disappear from the
> symbol name, this would be seen as a style marker, the next up to the
> start of '~a~' sould take the previous style, and the text after '~a~'
> would take the '0xa' style, but the '~a~' itself would always be
> stripped out.
> 
> One relatively easy solution here would be to say that, when we add the
> ability to include symbol names in the disassembler output buffers, at
> that point we can add "true" escaping.  So if your symbol name is
> 'foo~a~bar' then as this is added to the disassebmler buffer we would
> actually add 'foo~~a~~bar', and we'd extend the code that parses out
> styling information so that it could handle this case.  This feels like
> it should be easy enough to do.
> 
> All we then have to do is convince ourselves that there's no way for the
> escape character to make it into the disassembler output from any other
> source, and we should be fine.
> 
> For example, your concern about \001 escaping from gas.  Other than
> within a symbol name, how might the disassembler end up trying to print
> this byte?

As per above, I was wrong, simply because I find the disassembler behavior
here rather bogus.

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
  2022-05-18  7:06           ` [PATCH 2/2] " Jan Beulich
@ 2022-05-18 10:41             ` Andrew Burgess
  2022-05-18 10:46               ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Burgess @ 2022-05-18 10:41 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils

Jan Beulich via Binutils <binutils@sourceware.org> writes:

> On 09.05.2022 11:48, Andrew Burgess wrote:
>> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>>> On 03.05.2022 15:12, Andrew Burgess wrote:
>>>> I'm still keen to try and go with an escaping based solution, my
>>>> reasoning is that I think that this is the solution least likely to
>>>> introduce latent disassembler bugs.
>>>>
>>>> However, that position is based on my belief that there's no exhaustive
>>>> test for the i386 based disassembler, i.e. one that tests every single
>>>> valid instruction disassembles correctly.  If there was such a test then
>>>> I might be more tempted to try something more radical...
>>>>
>>>> That said, if I was going to stick with an escaping scheme, then I have
>>>> some ideas for moving forward.
>>>>
>>>> The current scheme relies on the fact that symbols are not printed
>>>> directly from the i386 disassembler, instead the i386 disassembler calls
>>>> back into the driver application (objdump, gdb) to print the symbol.  As
>>>> a result, symbols don't go through the instr_info::obuf buffer.  This
>>>> means that we never try to interpret a symbol name for escape
>>>> characters.
>>>
>>> Hmm, indeed. I have to admit that I view it as a significant shortcoming
>>> of the disassembler that it doesn't resolve addresses in the output. So
>>> I'd like to at least not see the road being closed towards improving this.
>>>
>>>> This means we avoid one of the issues that you raised, what if the
>>>> escape character appears in a symbol name; the answer is, I just don't
>>>> need to worry about this!
>>>>
>>>> So, I only need to ensure that the escape character is:
>>>>
>>>>   (a) not a character that the disassembler currently tries to directly
>>>>   print itself, and
>>>>
>>>>   (b) not something that will ever be printed as part of an immediate.
>>>
>>> Or, more generally, as part of any kind of operand.
>> 
>> Sure, but the reason I single out immedates here is I think these are
>> the only operand whose content is not statically know within the
>> disassembler.
>> 
>> For example, register operands, every possible register operand value is
>> enumerated within the i386-dis.c source file, right?  So when I proposed
>> using '~' I could simply search the source file, find no uses, and know
>> that character is not (currently) used within a register name.
>
> Indeed. Yet present state is only part of it. See the uses of { and }
> that AVX512 has added. Prior to that one could have thought these
> characters could easily be used for some special purpose (like your
> escaping), too. Hence my pointing out of possible future uses of ~,
> with the more general implication that all printable characters would
> better be avoided. But you've switched to \002 already anyway afaics.
>
>> Immediates are different though, for them we rely on libc to generate
>> the textual representation.
>
> Yet even then we know the set of characters libc might use.
>
>> The only other operand type that might contain "unknown" characters
>> would be a field that contains an address and potentially a symbol name,
>> but as was already discussed, these are not printed through the
>> disassembler.
>
> Hmm, yes. This behavior is so extremely counterintuitive to me that
> I keep forgetting. Not the least because in many cases a symbol
> name isn't printed at all even when one could be known. So yes, if
> ->print_address_func() doesn't look for escapes, then indeed all
> should be fine right now.
>
>> My question then, other than the exceptions I've already listed, are
>> there other types of operand where the content doesn't already exit
>> within i386-dis.c?
>
> I don't think there is right now.
>
>>>> Clearly my choice passes both right now, but looks like it will not pass
>>>> (b) forever.
>>>>
>>>> One possible solution would be to replace all the remaining places where
>>>> we directly write to instr_info::obuf with calls to oappend_char.
>>>
>>> I guess this might be troublesome. The way the disassembler works is a
>>> little quirky here and there, and hence one needs to play tricks every
>>> now and then to half-way reasonably deal with certain special cases.
>>>
>>>>  I
>>>> could then extend the oappend API such that we do "real" escaping, that
>>>> is (assuming the continued use of '~' for now): '~X' would indicate a
>>>> style marker, with X being the style number, and '~~' would indicate a
>>>> literal '~' character.  In this was we really wouldn't care which
>>>> character we used (though we'd probably pick one that didn't crop up too
>>>> ofter just for ease of parsing the buffers).
>>>>
>>>> An alternative solution would be to pick a non-printable character,
>>>> e.g. \001, and use this as the escape character in place of the current
>>>> '~'.  This seems to pass the (a) and (b) tests above, and if such a
>>>> character does ever appear in a symbol name, then, as I've said above, I
>>>> don't believe this would cause us any problems.
>>>
>>> I suppose \001 (or a character very close to this, as iirc \001 has
>>> some meaning internally in gas, and I'm not entirely certain none of
>>> these uses can ever "escape" gas) is good to start with. Provided it
>>> is properly abstracted so it can, if necessary, be _very_ easily
>>> changed (by modifying exactly one line, or - if you need both a
>>> single-quoted and a double-quoted instance - two adjacent ones).
>>>
>>> Albeit, thinking of this last aspect, maybe it would be better to
>>> only have a double-quoted instance in the first place, and allow
>>> for the escape to be more than a single character if need be ...
>>>
>>> And yes - if a symbol name was possible to hit and if that symbol
>>> name contained such an escape sequence, aiui the worst that would
>>> happen is bogus coloring? IOW the escape would not be looked for and
>>> replaced / processed when coloring is disabled?
>> 
>> Unfortunately this is not correct.  The disassembler always sends
>> styling information to the user (objdump, gdb, etc), its the user that
>> decides if the output should be styled or not.
>> 
>> What this means is that if the disassembler encountered a random symbol
>> (which would be a pretty big change to the disassembler), and the symbol
>> did include something like ~a~ (using the current character to make it
>> more readable here), then the whole '~a~' part would disappear from the
>> symbol name, this would be seen as a style marker, the next up to the
>> start of '~a~' sould take the previous style, and the text after '~a~'
>> would take the '0xa' style, but the '~a~' itself would always be
>> stripped out.
>> 
>> One relatively easy solution here would be to say that, when we add the
>> ability to include symbol names in the disassembler output buffers, at
>> that point we can add "true" escaping.  So if your symbol name is
>> 'foo~a~bar' then as this is added to the disassebmler buffer we would
>> actually add 'foo~~a~~bar', and we'd extend the code that parses out
>> styling information so that it could handle this case.  This feels like
>> it should be easy enough to do.
>> 
>> All we then have to do is convince ourselves that there's no way for the
>> escape character to make it into the disassembler output from any other
>> source, and we should be fine.
>> 
>> For example, your concern about \001 escaping from gas.  Other than
>> within a symbol name, how might the disassembler end up trying to print
>> this byte?
>
> As per above, I was wrong, simply because I find the disassembler behavior
> here rather bogus.

Jan,

Thanks for your feedback, this all sounds really positive now.

Is there anything else you'd like me to change with this patch before it
can be merged?

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
  2022-05-18 10:41             ` Andrew Burgess
@ 2022-05-18 10:46               ` Jan Beulich
  0 siblings, 0 replies; 29+ messages in thread
From: Jan Beulich @ 2022-05-18 10:46 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: binutils

On 18.05.2022 12:41, Andrew Burgess wrote:
> Is there anything else you'd like me to change with this patch before it
> can be merged?

I still need to get around to look at v2. I also think it should rather be
H.J. to finally approve the change.

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2] libopcodes: extend the styling within the i386 disassembler
  2022-05-09 12:54           ` [PATCHv2] " Andrew Burgess
@ 2022-05-18 12:27             ` Jan Beulich
  2022-05-26 12:48               ` Andrew Burgess
  2022-05-18 21:23             ` H.J. Lu
  2022-05-27 17:44             ` [PATCHv3] " Andrew Burgess
  2 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2022-05-18 12:27 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: binutils

On 09.05.2022 14:54, Andrew Burgess via Binutils wrote:
> @@ -248,6 +254,8 @@ struct instr_info
>  
>    enum x86_64_isa isa64;
>  
> +  int (*printf) (instr_info *ins, enum disassembler_style style,
> +		 const char *fmt, ...) ATTRIBUTE_FPTR_PRINTF_3;
>  };

Why do you go through a function pointer? Afaics it's only ever set
to i386_dis_printf(), so I don't see why you couldn't call the
function directly.

> @@ -9748,24 +9839,28 @@ print_insn (bfd_vma pc, instr_info *ins)
>  	if (name == NULL)
>  	  abort ();
>  	prefix_length += strlen (name) + 1;
> -	(*ins->info->fprintf_styled_func)
> -	  (ins->info->stream, dis_style_mnemonic, "%s ", name);
> +	ins->printf (ins, dis_style_mnemonic, "%s ", name);
>        }
>  
>    /* Check maximum code length.  */
>    if ((ins->codep - ins->start_codep) > MAX_CODE_LENGTH)
>      {
> -      (*ins->info->fprintf_styled_func)
> -	(ins->info->stream, dis_style_text, "(bad)");
> +      ins->printf (ins, dis_style_text, "(bad)");
>        return MAX_CODE_LENGTH;
>      }
>  
> -  ins->obufp = ins->mnemonicendp;
> -  for (i = strlen (ins->obuf) + prefix_length; i < 6; i++)
> -    oappend (ins, " ");
> -  oappend (ins, " ");
> -  (*ins->info->fprintf_styled_func)
> -    (ins->info->stream, dis_style_mnemonic, "%s", ins->obuf);
> +  i = strlen (ins->obuf);
> +  if (ins->mnemonicendp == ins->obuf + i)

What is this condition for? It doesn't look to match any of what the
original code does. In particular it's unclear to me ...

> +    {
> +      i += prefix_length;
> +      if (i < 6)
> +	i = 6 - i + 1;
> +      else
> +	i = 1;
> +    }
> +  else
> +    i = 0;

... what this "else" would cover.

> @@ -10224,8 +10314,11 @@ static void
>  OP_STi (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>  	int sizeflag ATTRIBUTE_UNUSED)
>  {
> -  sprintf (ins->scratchbuf, "%%st(%d)", ins->modrm.rm);
> -  oappend_maybe_intel (ins, ins->scratchbuf);
> +  oappend_maybe_intel (ins, "%st");
> +  oappend (ins, "(");

Any reason these last two aren't simply

  oappend_maybe_intel (ins, "%st(");

?

> +  sprintf (ins->scratchbuf, "%d", ins->modrm.rm);
> +  oappend_with_style (ins, ins->scratchbuf, dis_style_immediate);

This is not an immediate. The entire %st(N) is a register name (like
anything that starts with % in AT&T mode).

> @@ -10772,12 +10865,64 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
>    return 0;
>  }
>  
> +/* Add a style marker to *INS->obufp that encodes STYLE.  This assumes that
> +   the buffer pointed to by INS->obufp has space.  A style marker is made
> +   from the STYLE_MARKER_CHAR followed by STYLE converted to a single hex
> +   digit, followed by another STYLE_MARKER_CHAR.  This function assumes
> +   that the number of styles is not greater than 16.  */
> +
>  static void
> -oappend (instr_info *ins, const char *s)
> +oappend_insert_style (instr_info *ins, enum disassembler_style style)
>  {
> +  int num = (int) style;
> +
> +  /* We currently assume that STYLE can be encoded as a single hex
> +     character.  If more styles are added then this might start to fail,
> +     and we'll need to expand this code.  */
> +  if (num > 0xf)
> +    abort ();

You want to either also check for negative values or make "num" unsigned.

> @@ -10789,26 +10934,27 @@ append_seg (instr_info *ins)
>    switch (ins->active_seg_prefix)
>      {
>      case PREFIX_CS:
> -      oappend_maybe_intel (ins, "%cs:");
> +      oappend_maybe_intel_with_style (ins, "%cs", dis_style_register);

I was about to ask why dis_style_register needs specifying here, but I
notice the comment ahead of the function is misleading. There also are
cases where a leading '$' would be skipped. I wonder though whether it
wouldn't yield better readable code if those used a separate function,
thus eliminating the need for the explicit style parameter. E.g.
oappend_register() and oappend_immediate(). The "maybe_intel" part of
the name isn't really useful imo.

> @@ -13352,7 +13502,7 @@ OP_VexI4 (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>  {
>    ins->scratchbuf[0] = '$';
>    print_operand_value (ins, ins->scratchbuf + 1, 1, ins->codep[-1] & 0xf);
> -  oappend_maybe_intel (ins, ins->scratchbuf);
> +  oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_text);
>  }
>  
>  static void
> @@ -13397,7 +13547,7 @@ VPCMP_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>        /* We have a reserved extension byte.  Output it directly.  */
>        ins->scratchbuf[0] = '$';
>        print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
> -      oappend_maybe_intel (ins, ins->scratchbuf);
> +      oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_text);
>        ins->scratchbuf[0] = '\0';
>      }
>  }
> @@ -13449,7 +13599,7 @@ VPCOM_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>        /* We have a reserved extension byte.  Output it directly.  */
>        ins->scratchbuf[0] = '$';
>        print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
> -      oappend_maybe_intel (ins, ins->scratchbuf);
> +      oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_text);
>        ins->scratchbuf[0] = '\0';
>      }
>  }

Why "text" for these three immediates, but ...

> @@ -13497,7 +13647,8 @@ PCLMUL_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>        /* We have a reserved extension byte.  Output it directly.  */
>        ins->scratchbuf[0] = '$';
>        print_operand_value (ins, ins->scratchbuf + 1, 1, pclmul_type);
> -      oappend_maybe_intel (ins, ins->scratchbuf);
> +      oappend_maybe_intel_with_style (ins, ins->scratchbuf,
> +				      dis_style_immediate);
>        ins->scratchbuf[0] = '\0';
>      }
>  }

... "immediate" here?

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2] libopcodes: extend the styling within the i386 disassembler
  2022-05-09 12:54           ` [PATCHv2] " Andrew Burgess
  2022-05-18 12:27             ` Jan Beulich
@ 2022-05-18 21:23             ` H.J. Lu
  2022-05-27 17:44             ` [PATCHv3] " Andrew Burgess
  2 siblings, 0 replies; 29+ messages in thread
From: H.J. Lu @ 2022-05-18 21:23 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: Binutils

On Mon, May 9, 2022 at 5:54 AM Andrew Burgess via Binutils
<binutils@sourceware.org> wrote:
>
> In patch v2:
>
>   - Addressed all minor feedback items from Vladimir, H.J. and Jan,
>
>   - Switched to using \002 as the styling escape character,
>
>   - Escape character is defined once near the top of i386-dis.c making
>     it easy to switch to a different character if needed,
>
>   - Detection of the style escape character is stricter in
>     i386_dis_printf,
>
>   - Proper error handling in i386_dis_printf, though I can't imagine
>     when this would actually trigger.
>
> ---
>
> The i386 disassembler is pretty complex.  Most disassembly is done
> indirectly; operands are built into buffers within a struct instr_info
> instance, before finally being printed later in the disassembly
> process.
>
> Sometimes the operand buffers are built in a different order to the
> order in which they will eventually be printed.
>
> Each operand can contain multiple components, e.g. multiple registers,
> immediates, other textual elements (commas, brackets, etc).
>
> When looking for how to apply styling I guess the ideal solution would
> be to move away from the operands being a single string that is built
> up, and instead have each operand be a list of "parts", where each
> part is some text and a style.  Then, when we eventually print the
> operand we would loop over the parts and print each part with the
> correct style.
>
> But it feels like a huge amount of work to move from where we are
> now to that potentially ideal solution.  Plus, the above solution
> would be pretty complex.
>
> So, instead I propose a .... different solution here, one that works
> with the existing infrastructure.
>
> As each operand is built up, piece be piece, we pass through style
> information.  This style information is then encoded into the operand
> buffer (see below for details).  After this the code can continue to
> operate as it does right now in order to manage the set of operand
> buffers.
>
> Then, as each operand is printed we can split the operand buffer into
> chunks at the style marker boundaries, with each chunk being printed
> with the correct style.
>
> For encoding the style information I use a single character, currently
> \002, followed by the style encoded as a single hex digit, followed
> again by the \002 character.
>
> This of course relies on there not being more than 16 styles, but that
> is currently true, and hopefully will remain true for the foreseeable
> future.
>
> The other major concern that has arisen around this work is whether
> the escape character could ever be encountered in output naturally
> generated by the disassembler.  If this did happen then the escape
> characters would be stripped from the output, and the wrong styling
> would be applied.
>
> However, I don't believe that this is currently a problem.
> Disassembler content comes from a number of sources.  First there's
> content that copied directly from the i386-dis.c file, this is things
> like register names, and other syntax elements (brackets, commas,
> etc).  We can easily check that the i386-dis.c file doesn't contain
> our special character.
>
> The next source of content are immediate operands.  The text for these
> operands is generated by calls into libc.  By selecting a
> non-printable character we can be confident that this is not something
> that libc will generate as part of an immediate representation.
>
> The other output that appears to be from the disassembler is operands
> that contain addresses and (possibly) symbol names.  It is quite
> possible that a symbol name might contain any special character we
> could imagine, so is this a problem?
>
> I don't think it is, we don't actually print address and symbol
> operands through the disassembler, instead, the disassembler calls
> back to the user (objdump, gdb, etc) to print the address and symbol
> on its behalf.  This content is printed directly to the output stream,
> it does not pass through the i386 disassembler output buffers.  As a
> result, we never check this particular output for styling escape
> characters.
>
> In some (not very scientific) benchmarking on my machine,
> disassembling a reasonably large (142M) shared library, I'm not seeing
> any significant slow down in disassembler speed with this change.
>
> Most instructions are now being fully syntax highlighted when I
> disassemble using the --disassembler-color=extended-color option.  I'm
> sure that there are probably still a few corner cases that need fixing
> up, but we can come back to them later I think.
>
> When disassembler syntax highlighting is not being used, then there
> should be no user visible changes after this commit.
> ---
>  opcodes/i386-dis.c | 405 +++++++++++++++++++++++++++++++--------------
>  1 file changed, 278 insertions(+), 127 deletions(-)
>
> diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
> index 6ef091ea7d7..28834e4650b 100644
> --- a/opcodes/i386-dis.c
> +++ b/opcodes/i386-dis.c
> @@ -47,6 +47,8 @@ static void dofloat (instr_info *, int);
>  static void OP_ST (instr_info *, int, int);
>  static void OP_STi (instr_info *, int, int);
>  static int putop (instr_info *, const char *, int);
> +static void oappend_with_style (instr_info *, const char *,
> +                               enum disassembler_style);
>  static void oappend (instr_info *, const char *);
>  static void append_seg (instr_info *);
>  static void OP_indirE (instr_info *, int, int);
> @@ -116,6 +118,10 @@ static void FXSAVE_Fixup (instr_info *, int, int);
>  static void MOVSXD_Fixup (instr_info *, int, int);
>  static void DistinctDest_Fixup (instr_info *, int, int);
>
> +/* This character is used to encode style information within the output
> +   buffers.  See oappend_insert_style for more details.  */
> +#define STYLE_MARKER_CHAR '\002'
> +
>  struct dis_private {
>    /* Points to first byte not fetched.  */
>    bfd_byte *max_fetched;
> @@ -248,6 +254,8 @@ struct instr_info
>
>    enum x86_64_isa isa64;
>
> +  int (*printf) (instr_info *ins, enum disassembler_style style,
> +                const char *fmt, ...) ATTRIBUTE_FPTR_PRINTF_3;
>  };
>
>  /* Mark parts used in the REX prefix.  When we are testing for
> @@ -9298,11 +9306,103 @@ get_sib (instr_info *ins, int sizeflag)
>  }
>
>  /* Like oappend (below), but S is a string starting with '%'.
> -   In Intel syntax, the '%' is elided.  */
> +   In Intel syntax, the '%' is elided.  STYLE is used when displaying this
> +   part of the output in the disassembler.  */
> +
> +static void
> +oappend_maybe_intel_with_style (instr_info *ins, const char *s,
> +                               enum disassembler_style style)
> +{
> +  oappend_with_style (ins, s + ins->intel_syntax, style);
> +}
> +
> +/* Like oappend_maybe_intel_with_style, but always uses text style.  */
> +
>  static void
>  oappend_maybe_intel (instr_info *ins, const char *s)
>  {
> -  oappend (ins, s + ins->intel_syntax);
> +  oappend_maybe_intel_with_style (ins, s, dis_style_text);
> +}
> +
> +/* Wrap around a call to INS->info->fprintf_styled_func, printing FMT.
> +   STYLE is the default style to use in the fprintf_styled_func calls,
> +   however, FMT might include embedded style markers (see oappend_style),
> +   these embedded markers are not printed, but instead change the style
> +   used in the next fprintf_styled_func call.
> +
> +   Return non-zero to indicate the print call was a success.  */
> +
> +static int ATTRIBUTE_PRINTF_3
> +i386_dis_printf (instr_info *ins, enum disassembler_style style,
> +                const char *fmt, ...)
> +{
> +  va_list ap;
> +  enum disassembler_style curr_style = style;
> +  char *start, *curr;
> +  char staging_area[100];
> +  int res;
> +
> +  va_start (ap, fmt);
> +  res = vsnprintf (staging_area, sizeof (staging_area), fmt, ap);
> +  va_end (ap);
> +
> +  if (res < 0)
> +    return res;
> +
> +  start = curr = staging_area;
> +
> +  do
> +    {
> +      if (*curr == '\0'
> +         || (*curr == STYLE_MARKER_CHAR
> +             && ISXDIGIT (*(curr + 1))
> +             && *(curr + 2) == STYLE_MARKER_CHAR))
> +       {
> +         /* Output content between our START position and CURR.  */
> +         int len = curr - start;
> +         int n = (*ins->info->fprintf_styled_func) (ins->info->stream,
> +                                                    curr_style,
> +                                                    "%.*s", len, start);
> +         if (n < 0)
> +           {
> +             res = n;
> +             break;
> +           }
> +
> +         if (*curr == '\0')
> +           break;
> +
> +         /* Skip over the initial STYLE_MARKER_CHAR.  */
> +         ++curr;
> +
> +         /* Update the CURR_STYLE.  As there are less than 16 styles, it
> +            is possible, that if the input is corrupted in some way, that
> +            we might set CURR_STYLE to an invalid value.  Don't worry
> +            though, we check for this situation.  */
> +         if (*curr >= '0' && *curr <= '9')
> +           curr_style = (enum disassembler_style) (*curr - '0');
> +         else if (*curr >= 'a' && *curr <= 'f')
> +           curr_style = (enum disassembler_style) (*curr - 'a' + 10);
> +         else
> +           curr_style = dis_style_text;
> +
> +         /* Check for an invalid style having been selected.  This should
> +            never happen, but it doesn't hurt to be a little paranoid.  */
> +         if (curr_style > dis_style_comment_start)
> +           curr_style = dis_style_text;
> +
> +         /* Skip the hex character, and the closing STYLE_MARKER_CHAR.  */
> +         curr += 2;
> +
> +         /* Reset the START to after the style marker.  */
> +         start = curr;
> +       }
> +      else
> +       ++curr;
> +    }
> +  while (true);
> +
> +  return res;
>  }
>
>  static int
> @@ -9317,6 +9417,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>    struct dis_private priv;
>    int prefix_length;
>
> +  ins->printf = i386_dis_printf;
>    ins->isa64 = 0;
>    ins->intel_mnemonic = !SYSV386_COMPAT;
>    ins->op_is_jump = false;
> @@ -9401,8 +9502,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>
>    if (ins->address_mode == mode_64bit && sizeof (bfd_vma) < 8)
>      {
> -      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
> -                                        _("64-bit address is disabled"));
> +      ins->printf (ins, dis_style_text, _("64-bit address is disabled"));
>        return -1;
>      }
>
> @@ -9451,16 +9551,14 @@ print_insn (bfd_vma pc, instr_info *ins)
>         {
>           name = prefix_name (ins, priv.the_buffer[0], priv.orig_sizeflag);
>           if (name != NULL)
> -           (*ins->info->fprintf_styled_func)
> -             (ins->info->stream, dis_style_mnemonic, "%s", name);
> +           ins->printf (ins, dis_style_mnemonic, "%s", name);
>           else
>             {
>               /* Just print the first byte as a .byte instruction.  */
> -             (*ins->info->fprintf_styled_func)
> -               (ins->info->stream, dis_style_assembler_directive, ".byte ");
> -             (*ins->info->fprintf_styled_func)
> -               (ins->info->stream, dis_style_immediate, "0x%x",
> -                (unsigned int) priv.the_buffer[0]);
> +             ins->printf (ins, dis_style_assembler_directive,
> +                          ".byte ");
> +             ins->printf (ins, dis_style_immediate, "0x%x",
> +                          (unsigned int) priv.the_buffer[0]);
>             }
>
>           return 1;
> @@ -9478,10 +9576,9 @@ print_insn (bfd_vma pc, instr_info *ins)
>        for (i = 0;
>            i < (int) ARRAY_SIZE (ins->all_prefixes) && ins->all_prefixes[i];
>            i++)
> -       (*ins->info->fprintf_styled_func)
> -         (ins->info->stream, dis_style_mnemonic, "%s%s",
> -          (i == 0 ? "" : " "), prefix_name (ins, ins->all_prefixes[i],
> -                                            sizeflag));
> +       ins->printf (ins, dis_style_mnemonic, "%s%s",
> +                    (i == 0 ? "" : " "),
> +                    prefix_name (ins, ins->all_prefixes[i], sizeflag));
>        return i;
>      }
>
> @@ -9496,11 +9593,9 @@ print_insn (bfd_vma pc, instr_info *ins)
>        /* Handle ins->prefixes before fwait.  */
>        for (i = 0; i < ins->fwait_prefix && ins->all_prefixes[i];
>            i++)
> -       (*ins->info->fprintf_styled_func)
> -         (ins->info->stream, dis_style_mnemonic, "%s ",
> -          prefix_name (ins, ins->all_prefixes[i], sizeflag));
> -      (*ins->info->fprintf_styled_func)
> -       (ins->info->stream, dis_style_mnemonic, "fwait");
> +       ins->printf (ins, dis_style_mnemonic, "%s ",
> +                    prefix_name (ins, ins->all_prefixes[i], sizeflag));
> +      ins->printf (ins, dis_style_mnemonic, "fwait");
>        return i + 1;
>      }
>
> @@ -9649,16 +9744,14 @@ print_insn (bfd_vma pc, instr_info *ins)
>       are all 0s in inverted form.  */
>    if (ins->need_vex && ins->vex.register_specifier != 0)
>      {
> -      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
> -                                        "(bad)");
> +      ins->printf (ins, dis_style_text, "(bad)");
>        return ins->end_codep - priv.the_buffer;
>      }
>
>    /* If EVEX.z is set, there must be an actual mask register in use.  */
>    if (ins->vex.zeroing && ins->vex.mask_register_specifier == 0)
>      {
> -      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
> -                                        "(bad)");
> +      ins->printf (ins, dis_style_text, "(bad)");
>        return ins->end_codep - priv.the_buffer;
>      }
>
> @@ -9669,8 +9762,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>          the encoding invalid.  Most other PREFIX_OPCODE rules still apply.  */
>        if (ins->need_vex ? !ins->vex.prefix : !(ins->prefixes & PREFIX_DATA))
>         {
> -         (*ins->info->fprintf_styled_func) (ins->info->stream,
> -                                            dis_style_text, "(bad)");
> +         ins->printf (ins, dis_style_text, "(bad)");
>           return ins->end_codep - priv.the_buffer;
>         }
>        ins->used_prefixes |= PREFIX_DATA;
> @@ -9697,8 +9789,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>           || (ins->vex.evex && dp->prefix_requirement != PREFIX_DATA
>               && !ins->vex.w != !(ins->used_prefixes & PREFIX_DATA)))
>         {
> -         (*ins->info->fprintf_styled_func) (ins->info->stream,
> -                                            dis_style_text, "(bad)");
> +         ins->printf (ins, dis_style_text, "(bad)");
>           return ins->end_codep - priv.the_buffer;
>         }
>        break;
> @@ -9748,24 +9839,28 @@ print_insn (bfd_vma pc, instr_info *ins)
>         if (name == NULL)
>           abort ();
>         prefix_length += strlen (name) + 1;
> -       (*ins->info->fprintf_styled_func)
> -         (ins->info->stream, dis_style_mnemonic, "%s ", name);
> +       ins->printf (ins, dis_style_mnemonic, "%s ", name);
>        }
>
>    /* Check maximum code length.  */
>    if ((ins->codep - ins->start_codep) > MAX_CODE_LENGTH)
>      {
> -      (*ins->info->fprintf_styled_func)
> -       (ins->info->stream, dis_style_text, "(bad)");
> +      ins->printf (ins, dis_style_text, "(bad)");
>        return MAX_CODE_LENGTH;
>      }
>
> -  ins->obufp = ins->mnemonicendp;
> -  for (i = strlen (ins->obuf) + prefix_length; i < 6; i++)
> -    oappend (ins, " ");
> -  oappend (ins, " ");
> -  (*ins->info->fprintf_styled_func)
> -    (ins->info->stream, dis_style_mnemonic, "%s", ins->obuf);
> +  i = strlen (ins->obuf);
> +  if (ins->mnemonicendp == ins->obuf + i)
> +    {
> +      i += prefix_length;
> +      if (i < 6)
> +       i = 6 - i + 1;
> +      else
> +       i = 1;
> +    }
> +  else
> +    i = 0;
> +  ins->printf (ins, dis_style_mnemonic, "%s%*s", ins->obuf, i, "");
>
>    /* The enter and bound instructions are printed with operands in the same
>       order as the intel book; everything else is printed in reverse order.  */
> @@ -9804,8 +9899,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>      if (*op_txt[i])
>        {
>         if (needcomma)
> -         (*ins->info->fprintf_styled_func) (ins->info->stream,
> -                                            dis_style_text, ",");
> +         ins->printf (ins, dis_style_text, ",");
>         if (ins->op_index[i] != -1 && !ins->op_riprel[i])
>           {
>             bfd_vma target = (bfd_vma) ins->op_address[ins->op_index[i]];
> @@ -9821,18 +9915,14 @@ print_insn (bfd_vma pc, instr_info *ins)
>             (*ins->info->print_address_func) (target, ins->info);
>           }
>         else
> -         (*ins->info->fprintf_styled_func) (ins->info->stream,
> -                                            dis_style_text, "%s",
> -                                            op_txt[i]);
> +         ins->printf (ins, dis_style_text, "%s", op_txt[i]);
>         needcomma = 1;
>        }
>
>    for (i = 0; i < MAX_OPERANDS; i++)
>      if (ins->op_index[i] != -1 && ins->op_riprel[i])
>        {
> -       (*ins->info->fprintf_styled_func) (ins->info->stream,
> -                                          dis_style_comment_start,
> -                                          "        # ");
> +       ins->printf (ins, dis_style_comment_start, "        # ");
>         (*ins->info->print_address_func) ((bfd_vma)
>                         (ins->start_pc + (ins->codep - ins->start_codep)
>                          + ins->op_address[ins->op_index[i]]), ins->info);
> @@ -10224,8 +10314,11 @@ static void
>  OP_STi (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>         int sizeflag ATTRIBUTE_UNUSED)
>  {
> -  sprintf (ins->scratchbuf, "%%st(%d)", ins->modrm.rm);
> -  oappend_maybe_intel (ins, ins->scratchbuf);
> +  oappend_maybe_intel (ins, "%st");
> +  oappend (ins, "(");
> +  sprintf (ins->scratchbuf, "%d", ins->modrm.rm);
> +  oappend_with_style (ins, ins->scratchbuf, dis_style_immediate);
> +  oappend (ins, ")");
>  }
>
>  /* Capital letters in template are macros.  */
> @@ -10772,12 +10865,64 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
>    return 0;
>  }
>
> +/* Add a style marker to *INS->obufp that encodes STYLE.  This assumes that
> +   the buffer pointed to by INS->obufp has space.  A style marker is made
> +   from the STYLE_MARKER_CHAR followed by STYLE converted to a single hex
> +   digit, followed by another STYLE_MARKER_CHAR.  This function assumes
> +   that the number of styles is not greater than 16.  */
> +
>  static void
> -oappend (instr_info *ins, const char *s)
> +oappend_insert_style (instr_info *ins, enum disassembler_style style)
>  {
> +  int num = (int) style;
> +
> +  /* We currently assume that STYLE can be encoded as a single hex
> +     character.  If more styles are added then this might start to fail,
> +     and we'll need to expand this code.  */
> +  if (num > 0xf)
> +    abort ();
> +
> +  *ins->obufp++ = STYLE_MARKER_CHAR;
> +  *ins->obufp++ = (num < 10 ? ('0' + num)
> +                  : ((num < 16) ? ('a' + (num - 10)) : '0'));
> +  *ins->obufp++ = STYLE_MARKER_CHAR;
> +
> +  /* This final null character is not strictly necessary, after inserting a
> +     style marker we should always be inserting some additional content.
> +     However, having the buffer null terminated doesn't cost much, and make
> +     it easier to debug what's going on.  Also, if we do ever forget to add
> +     any additional content after this style marker, then the buffer will
> +     still be well formed.  */
> +  *ins->obufp = '\0';
> +}
> +
> +static void
> +oappend_with_style (instr_info *ins, const char *s,
> +                   enum disassembler_style style)
> +{
> +  oappend_insert_style (ins, style);
>    ins->obufp = stpcpy (ins->obufp, s);
>  }
>
> +/* Like oappend_with_style but always with text style.  */
> +
> +static void
> +oappend (instr_info *ins, const char *s)
> +{
> +  oappend_with_style (ins, s, dis_style_text);
> +}
> +
> +/* Add a single character C to the buffer pointer to by INS->obufp, marking
> +   the style for the character as STYLE.  */
> +
> +static void
> +oappend_char (instr_info *ins, const char c, enum disassembler_style style)
> +{
> +  oappend_insert_style (ins, style);
> +  *ins->obufp++ = c;
> +  *ins->obufp = '\0';
> +}
> +
>  static void
>  append_seg (instr_info *ins)
>  {
> @@ -10789,26 +10934,27 @@ append_seg (instr_info *ins)
>    switch (ins->active_seg_prefix)
>      {
>      case PREFIX_CS:
> -      oappend_maybe_intel (ins, "%cs:");
> +      oappend_maybe_intel_with_style (ins, "%cs", dis_style_register);
>        break;
>      case PREFIX_DS:
> -      oappend_maybe_intel (ins, "%ds:");
> +      oappend_maybe_intel_with_style (ins, "%ds", dis_style_register);
>        break;
>      case PREFIX_SS:
> -      oappend_maybe_intel (ins, "%ss:");
> +      oappend_maybe_intel_with_style (ins, "%ss", dis_style_register);
>        break;
>      case PREFIX_ES:
> -      oappend_maybe_intel (ins, "%es:");
> +      oappend_maybe_intel_with_style (ins, "%es", dis_style_register);
>        break;
>      case PREFIX_FS:
> -      oappend_maybe_intel (ins, "%fs:");
> +      oappend_maybe_intel_with_style (ins, "%fs", dis_style_register);
>        break;
>      case PREFIX_GS:
> -      oappend_maybe_intel (ins, "%gs:");
> +      oappend_maybe_intel_with_style (ins, "%gs", dis_style_register);
>        break;
>      default:
>        break;
>      }
> +  oappend_char (ins, ':', dis_style_text);
>  }
>
>  static void
> @@ -11296,7 +11442,7 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
>        oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
>        return;
>      }
> -  oappend_maybe_intel (ins, names[reg]);
> +  oappend_maybe_intel_with_style (ins, names[reg], dis_style_register);
>  }
>
>  static void
> @@ -11560,11 +11706,15 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>               print_displacement (ins, ins->scratchbuf, disp);
>             else
>               print_operand_value (ins, ins->scratchbuf, 1, disp);
> -           oappend (ins, ins->scratchbuf);
> +           oappend_with_style (ins, ins->scratchbuf,
> +                               dis_style_address_offset);
>             if (riprel)
>               {
>                 set_op (ins, disp, 1);
> -               oappend (ins, !addr32flag ? "(%rip)" : "(%eip)");
> +               oappend_char (ins, '(', dis_style_text);
> +               oappend_with_style (ins, !addr32flag ? "%rip" : "%eip",
> +                                   dis_style_register);
> +               oappend_char (ins, ')', dis_style_text);
>               }
>           }
>
> @@ -11578,17 +11728,19 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>
>        if (havedisp || (ins->intel_syntax && riprel))
>         {
> -         *ins->obufp++ = ins->open_char;
> +         oappend_char (ins, ins->open_char, dis_style_text);
>           if (ins->intel_syntax && riprel)
>             {
>               set_op (ins, disp, 1);
> -             oappend (ins, !addr32flag ? "rip" : "eip");
> +             oappend_with_style (ins, !addr32flag ? "rip" : "eip",
> +                                 dis_style_register);
>             }
> -         *ins->obufp = '\0';
>           if (havebase)
> -           oappend_maybe_intel (ins,
> -                                (ins->address_mode == mode_64bit && !addr32flag
> -                                 ? att_names64 : att_names32)[rbase]);
> +           oappend_maybe_intel_with_style
> +             (ins,
> +              (ins->address_mode == mode_64bit && !addr32flag
> +               ? att_names64 : att_names32)[rbase],
> +              dis_style_register);
>           if (ins->has_sib)
>             {
>               /* ESP/RSP won't allow index.  If base isn't ESP/RSP,
> @@ -11599,14 +11751,12 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>                   || (havebase && base != ESP_REG_NUM))
>                 {
>                   if (!ins->intel_syntax || havebase)
> -                   {
> -                     *ins->obufp++ = ins->separator_char;
> -                     *ins->obufp = '\0';
> -                   }
> +                   oappend_char (ins, ins->separator_char, dis_style_text);
>                   if (indexes)
>                     {
>                       if (ins->address_mode == mode_64bit || vindex < 16)
> -                       oappend_maybe_intel (ins, indexes[vindex]);
> +                       oappend_maybe_intel_with_style (ins, indexes[vindex],
> +                                                       dis_style_register);
>                       else
>                         oappend (ins, "(bad)");
>                     }
> @@ -11614,26 +11764,22 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>                     oappend_maybe_intel (ins,
>                                          ins->address_mode == mode_64bit
>                                          && !addr32flag ? att_index64
> -                                                       : att_index32);
> +                                        : att_index32);
>
> -                 *ins->obufp++ = ins->scale_char;
> -                 *ins->obufp = '\0';
> +                 oappend_char (ins, ins->scale_char, dis_style_text);
>                   sprintf (ins->scratchbuf, "%d", 1 << scale);
> -                 oappend (ins, ins->scratchbuf);
> +                 oappend_with_style (ins, ins->scratchbuf,
> +                                     dis_style_immediate);
>                 }
>             }
>           if (ins->intel_syntax
>               && (disp || ins->modrm.mod != 0 || base == 5))
>             {
>               if (!havedisp || (bfd_signed_vma) disp >= 0)
> -               {
> -                 *ins->obufp++ = '+';
> -                 *ins->obufp = '\0';
> -               }
> +                 oappend_char (ins, '+', dis_style_text);
>               else if (ins->modrm.mod != 1 && disp != -disp)
>                 {
> -                 *ins->obufp++ = '-';
> -                 *ins->obufp = '\0';
> +                 oappend_char (ins, '-', dis_style_text);
>                   disp = -disp;
>                 }
>
> @@ -11644,8 +11790,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>               oappend (ins, ins->scratchbuf);
>             }
>
> -         *ins->obufp++ = ins->close_char;
> -         *ins->obufp = '\0';
> +         oappend_char (ins, ins->close_char, dis_style_text);
>
>           if (check_gather)
>             {
> @@ -11666,7 +11811,8 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>             {
>               if (!ins->active_seg_prefix)
>                 {
> -                 oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
> +                 oappend_maybe_intel_with_style
> +                   (ins, att_names_seg[ds_reg - es_reg], dis_style_register);
>                   oappend (ins, ":");
>                 }
>               print_operand_value (ins, ins->scratchbuf, 1, disp);
> @@ -11722,23 +11868,17 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>
>        if (ins->modrm.mod != 0 || ins->modrm.rm != 6)
>         {
> -         *ins->obufp++ = ins->open_char;
> -         *ins->obufp = '\0';
> -         oappend (ins,
> -                  (ins->intel_syntax ? intel_index16
> -                                     : att_index16)[ins->modrm.rm]);
> +         oappend_char (ins, ins->open_char, dis_style_text);
> +         oappend (ins, (ins->intel_syntax ? intel_index16
> +                        : att_index16)[ins->modrm.rm]);
>           if (ins->intel_syntax
>               && (disp || ins->modrm.mod != 0 || ins->modrm.rm == 6))
>             {
>               if ((bfd_signed_vma) disp >= 0)
> -               {
> -                 *ins->obufp++ = '+';
> -                 *ins->obufp = '\0';
> -               }
> +               oappend_char (ins, '+', dis_style_text);
>               else if (ins->modrm.mod != 1)
>                 {
> -                 *ins->obufp++ = '-';
> -                 *ins->obufp = '\0';
> +                 oappend_char (ins, '-', dis_style_text);
>                   disp = -disp;
>                 }
>
> @@ -11746,14 +11886,14 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>               oappend (ins, ins->scratchbuf);
>             }
>
> -         *ins->obufp++ = ins->close_char;
> -         *ins->obufp = '\0';
> +         oappend_char (ins, ins->close_char, dis_style_text);
>         }
>        else if (ins->intel_syntax)
>         {
>           if (!ins->active_seg_prefix)
>             {
> -             oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
> +             oappend_maybe_intel_with_style
> +               (ins, att_names_seg[ds_reg - es_reg], dis_style_register);
>               oappend (ins, ":");
>             }
>           print_operand_value (ins, ins->scratchbuf, 1, disp & 0xffff);
> @@ -11969,7 +12109,8 @@ OP_REG (instr_info *ins, int code, int sizeflag)
>      {
>      case es_reg: case ss_reg: case cs_reg:
>      case ds_reg: case fs_reg: case gs_reg:
> -      oappend_maybe_intel (ins, att_names_seg[code - es_reg]);
> +      oappend_maybe_intel_with_style
> +       (ins, att_names_seg[code - es_reg], dis_style_register);
>        return;
>      }
>
> @@ -12022,7 +12163,7 @@ OP_REG (instr_info *ins, int code, int sizeflag)
>        oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
>        return;
>      }
> -  oappend_maybe_intel (ins, s);
> +  oappend_maybe_intel_with_style (ins, s, dis_style_register);
>  }
>
>  static void
> @@ -12063,7 +12204,7 @@ OP_IMREG (instr_info *ins, int code, int sizeflag)
>        oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
>        return;
>      }
> -  oappend_maybe_intel (ins, s);
> +  oappend_maybe_intel_with_style (ins, s, dis_style_register);
>  }
>
>  static void
> @@ -12118,7 +12259,7 @@ OP_I (instr_info *ins, int bytemode, int sizeflag)
>    op &= mask;
>    ins->scratchbuf[0] = '$';
>    print_operand_value (ins, ins->scratchbuf + 1, 1, op);
> -  oappend_maybe_intel (ins, ins->scratchbuf);
> +  oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_immediate);
>    ins->scratchbuf[0] = '\0';
>  }
>
> @@ -12136,7 +12277,7 @@ OP_I64 (instr_info *ins, int bytemode, int sizeflag)
>
>    ins->scratchbuf[0] = '$';
>    print_operand_value (ins, ins->scratchbuf + 1, 1, get64 (ins));
> -  oappend_maybe_intel (ins, ins->scratchbuf);
> +  oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_immediate);
>    ins->scratchbuf[0] = '\0';
>  }
>
> @@ -12190,7 +12331,7 @@ OP_sI (instr_info *ins, int bytemode, int sizeflag)
>
>    ins->scratchbuf[0] = '$';
>    print_operand_value (ins, ins->scratchbuf + 1, 1, op);
> -  oappend_maybe_intel (ins, ins->scratchbuf);
> +  oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_immediate);
>  }
>
>  static void
> @@ -12248,7 +12389,8 @@ static void
>  OP_SEG (instr_info *ins, int bytemode, int sizeflag)
>  {
>    if (bytemode == w_mode)
> -    oappend_maybe_intel (ins, att_names_seg[ins->modrm.reg]);
> +    oappend_maybe_intel_with_style
> +      (ins, att_names_seg[ins->modrm.reg], dis_style_register);
>    else
>      OP_E (ins, ins->modrm.mod == 3 ? bytemode : w_mode, sizeflag);
>  }
> @@ -12294,12 +12436,13 @@ OP_OFF (instr_info *ins, int bytemode, int sizeflag)
>      {
>        if (!ins->active_seg_prefix)
>         {
> -         oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
> +         oappend_maybe_intel_with_style (ins, att_names_seg[ds_reg - es_reg],
> +                                         dis_style_register);
>           oappend (ins, ":");
>         }
>      }
>    print_operand_value (ins, ins->scratchbuf, 1, off);
> -  oappend (ins, ins->scratchbuf);
> +  oappend_with_style (ins, ins->scratchbuf, dis_style_address_offset);
>  }
>
>  static void
> @@ -12324,12 +12467,14 @@ OP_OFF64 (instr_info *ins, int bytemode, int sizeflag)
>      {
>        if (!ins->active_seg_prefix)
>         {
> -         oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
> +         oappend_maybe_intel_with_style (ins,
> +                                         att_names_seg[ds_reg - es_reg],
> +                                         dis_style_register);
>           oappend (ins, ":");
>         }
>      }
>    print_operand_value (ins, ins->scratchbuf, 1, off);
> -  oappend (ins, ins->scratchbuf);
> +  oappend_with_style (ins, ins->scratchbuf, dis_style_address_offset);
>  }
>
>  static void
> @@ -12350,9 +12495,8 @@ ptr_reg (instr_info *ins, int code, int sizeflag)
>      s = att_names32[code - eAX_reg];
>    else
>      s = att_names16[code - eAX_reg];
> -  oappend_maybe_intel (ins, s);
> -  *ins->obufp++ = ins->close_char;
> -  *ins->obufp = 0;
> +  oappend_maybe_intel_with_style (ins, s, dis_style_register);
> +  oappend_char (ins, ins->close_char, dis_style_text);
>  }
>
>  static void
> @@ -12375,7 +12519,8 @@ OP_ESreg (instr_info *ins, int code, int sizeflag)
>           intel_operand_size (ins, b_mode, sizeflag);
>         }
>      }
> -  oappend_maybe_intel (ins, "%es:");
> +  oappend_maybe_intel_with_style (ins, "%es", dis_style_register);
> +  oappend_char (ins, ':', dis_style_text);
>    ptr_reg (ins, code, sizeflag);
>  }
>
> @@ -12470,7 +12615,7 @@ OP_MMX (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>      }
>    else
>      names = att_names_mm;
> -  oappend_maybe_intel (ins, names[reg]);
> +  oappend_maybe_intel_with_style (ins, names[reg], dis_style_register);
>  }
>
>  static void
> @@ -12543,7 +12688,7 @@ print_vector_reg (instr_info *ins, unsigned int reg, int bytemode)
>      }
>    else
>      names = att_names_xmm;
> -  oappend_maybe_intel (ins, names[reg]);
> +  oappend_maybe_intel_with_style (ins, names[reg], dis_style_register);
>  }
>
>  static void
> @@ -12603,7 +12748,7 @@ OP_EM (instr_info *ins, int bytemode, int sizeflag)
>      }
>    else
>      names = att_names_mm;
> -  oappend_maybe_intel (ins, names[reg]);
> +  oappend_maybe_intel_with_style (ins, names[reg], dis_style_register);
>  }
>
>  /* cvt* are the only instructions in sse2 which have
> @@ -12629,7 +12774,8 @@ OP_EMC (instr_info *ins, int bytemode, int sizeflag)
>    MODRM_CHECK;
>    ins->codep++;
>    ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
> -  oappend_maybe_intel (ins, att_names_mm[ins->modrm.rm]);
> +  oappend_maybe_intel_with_style (ins, att_names_mm[ins->modrm.rm],
> +                                 dis_style_register);
>  }
>
>  static void
> @@ -12813,7 +12959,7 @@ OP_3DNowSuffix (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>    ins->obufp = ins->mnemonicendp;
>    mnemonic = Suffix3DNow[*ins->codep++ & 0xff];
>    if (mnemonic)
> -    oappend (ins, mnemonic);
> +    ins->obufp = stpcpy (ins->obufp, mnemonic);
>    else
>      {
>        /* Since a variable sized ins->modrm/ins->sib chunk is between the start
> @@ -12959,7 +13105,7 @@ BadOp (instr_info *ins)
>  {
>    /* Throw away prefixes and 1st. opcode byte.  */
>    ins->codep = ins->insn_codep + 1;
> -  oappend (ins, "(bad)");
> +  ins->obufp = stpcpy (ins->obufp, "(bad)");
>  }
>
>  static void
> @@ -13172,7 +13318,8 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>    switch (bytemode)
>      {
>      case scalar_mode:
> -      oappend_maybe_intel (ins, att_names_xmm[reg]);
> +      oappend_maybe_intel_with_style (ins, att_names_xmm[reg],
> +                                     dis_style_register);
>        return;
>
>      case vex_vsib_d_w_dq_mode:
> @@ -13183,9 +13330,11 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>        if (ins->vex.length == 128
>           || (bytemode != vex_vsib_d_w_dq_mode
>               && !ins->vex.w))
> -       oappend_maybe_intel (ins, att_names_xmm[reg]);
> +       oappend_maybe_intel_with_style (ins, att_names_xmm[reg],
> +                                       dis_style_register);
>        else
> -       oappend_maybe_intel (ins, att_names_ymm[reg]);
> +       oappend_maybe_intel_with_style (ins, att_names_ymm[reg],
> +                                       dis_style_register);
>
>        /* All 3 XMM/YMM registers must be distinct.  */
>        modrm_reg = ins->modrm.reg;
> @@ -13217,7 +13366,8 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>           /* This must be the 3rd operand.  */
>           if (ins->obufp != ins->op_out[2])
>             abort ();
> -         oappend_maybe_intel (ins, att_names_tmm[reg]);
> +         oappend_maybe_intel_with_style (ins, att_names_tmm[reg],
> +                                         dis_style_register);
>           if (reg == ins->modrm.reg || reg == ins->modrm.rm)
>             strcpy (ins->obufp, "/(bad)");
>         }
> @@ -13292,7 +13442,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>        abort ();
>        break;
>      }
> -  oappend_maybe_intel (ins, names[reg]);
> +  oappend_maybe_intel_with_style (ins, names[reg], dis_style_register);
>  }
>
>  static void
> @@ -13335,7 +13485,7 @@ OP_REG_VexI4 (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
>    if (bytemode == x_mode && ins->vex.length == 256)
>      names = att_names_ymm;
>
> -  oappend_maybe_intel (ins, names[reg]);
> +  oappend_maybe_intel_with_style (ins, names[reg], dis_style_register);
>
>    if (ins->vex.w)
>      {
> @@ -13352,7 +13502,7 @@ OP_VexI4 (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>  {
>    ins->scratchbuf[0] = '$';
>    print_operand_value (ins, ins->scratchbuf + 1, 1, ins->codep[-1] & 0xf);
> -  oappend_maybe_intel (ins, ins->scratchbuf);
> +  oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_text);

This change isn't needed.

>  }
>
>  static void
> @@ -13397,7 +13547,7 @@ VPCMP_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>        /* We have a reserved extension byte.  Output it directly.  */
>        ins->scratchbuf[0] = '$';
>        print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
> -      oappend_maybe_intel (ins, ins->scratchbuf);
> +      oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_text);

This change isn't needed.

>        ins->scratchbuf[0] = '\0';
>      }
>  }
> @@ -13449,7 +13599,7 @@ VPCOM_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>        /* We have a reserved extension byte.  Output it directly.  */
>        ins->scratchbuf[0] = '$';
>        print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
> -      oappend_maybe_intel (ins, ins->scratchbuf);
> +      oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_text);

This change isn't needed.

>        ins->scratchbuf[0] = '\0';
>      }
>  }
> @@ -13497,7 +13647,8 @@ PCLMUL_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>        /* We have a reserved extension byte.  Output it directly.  */
>        ins->scratchbuf[0] = '$';
>        print_operand_value (ins, ins->scratchbuf + 1, 1, pclmul_type);
> -      oappend_maybe_intel (ins, ins->scratchbuf);
> +      oappend_maybe_intel_with_style (ins, ins->scratchbuf,
> +                                     dis_style_immediate);
>        ins->scratchbuf[0] = '\0';
>      }
>  }
> --
> 2.25.4
>


-- 
H.J.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv2] libopcodes: extend the styling within the i386 disassembler
  2022-05-18 12:27             ` Jan Beulich
@ 2022-05-26 12:48               ` Andrew Burgess
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Burgess @ 2022-05-26 12:48 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils

Jan Beulich via Binutils <binutils@sourceware.org> writes:

> On 09.05.2022 14:54, Andrew Burgess via Binutils wrote:
>> @@ -248,6 +254,8 @@ struct instr_info
>>  
>>    enum x86_64_isa isa64;
>>  
>> +  int (*printf) (instr_info *ins, enum disassembler_style style,
>> +		 const char *fmt, ...) ATTRIBUTE_FPTR_PRINTF_3;
>>  };
>
> Why do you go through a function pointer? Afaics it's only ever set
> to i386_dis_printf(), so I don't see why you couldn't call the
> function directly.
>
>> @@ -9748,24 +9839,28 @@ print_insn (bfd_vma pc, instr_info *ins)
>>  	if (name == NULL)
>>  	  abort ();
>>  	prefix_length += strlen (name) + 1;
>> -	(*ins->info->fprintf_styled_func)
>> -	  (ins->info->stream, dis_style_mnemonic, "%s ", name);
>> +	ins->printf (ins, dis_style_mnemonic, "%s ", name);
>>        }
>>  
>>    /* Check maximum code length.  */
>>    if ((ins->codep - ins->start_codep) > MAX_CODE_LENGTH)
>>      {
>> -      (*ins->info->fprintf_styled_func)
>> -	(ins->info->stream, dis_style_text, "(bad)");
>> +      ins->printf (ins, dis_style_text, "(bad)");
>>        return MAX_CODE_LENGTH;
>>      }
>>  
>> -  ins->obufp = ins->mnemonicendp;
>> -  for (i = strlen (ins->obuf) + prefix_length; i < 6; i++)
>> -    oappend (ins, " ");
>> -  oappend (ins, " ");
>> -  (*ins->info->fprintf_styled_func)
>> -    (ins->info->stream, dis_style_mnemonic, "%s", ins->obuf);
>> +  i = strlen (ins->obuf);
>> +  if (ins->mnemonicendp == ins->obuf + i)
>
> What is this condition for? It doesn't look to match any of what the
> original code does. In particular it's unclear to me ...
>
>> +    {
>> +      i += prefix_length;
>> +      if (i < 6)
>> +	i = 6 - i + 1;
>> +      else
>> +	i = 1;
>> +    }
>> +  else
>> +    i = 0;
>
> ... what this "else" would cover.

This whole nonsense was a convoluted method of maintaining compatibility
with the existing disassembler when it comes to emitting trailing
whitespace.

I've now posted this separate patch:

  https://sourceware.org/pipermail/binutils/2022-May/121054.html

which fixes what I think are some inconsistencies in how the existing
disassembler handles whitespace.

With that patch merged this whole hunk will disappear from this patch.

I'm in the process of addressing the remaining points that you and
H.J. have raised.

Thanks,
Andrew


>
>> @@ -10224,8 +10314,11 @@ static void
>>  OP_STi (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>>  	int sizeflag ATTRIBUTE_UNUSED)
>>  {
>> -  sprintf (ins->scratchbuf, "%%st(%d)", ins->modrm.rm);
>> -  oappend_maybe_intel (ins, ins->scratchbuf);
>> +  oappend_maybe_intel (ins, "%st");
>> +  oappend (ins, "(");
>
> Any reason these last two aren't simply
>
>   oappend_maybe_intel (ins, "%st(");
>
> ?
>
>> +  sprintf (ins->scratchbuf, "%d", ins->modrm.rm);
>> +  oappend_with_style (ins, ins->scratchbuf, dis_style_immediate);
>
> This is not an immediate. The entire %st(N) is a register name (like
> anything that starts with % in AT&T mode).
>
>> @@ -10772,12 +10865,64 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
>>    return 0;
>>  }
>>  
>> +/* Add a style marker to *INS->obufp that encodes STYLE.  This assumes that
>> +   the buffer pointed to by INS->obufp has space.  A style marker is made
>> +   from the STYLE_MARKER_CHAR followed by STYLE converted to a single hex
>> +   digit, followed by another STYLE_MARKER_CHAR.  This function assumes
>> +   that the number of styles is not greater than 16.  */
>> +
>>  static void
>> -oappend (instr_info *ins, const char *s)
>> +oappend_insert_style (instr_info *ins, enum disassembler_style style)
>>  {
>> +  int num = (int) style;
>> +
>> +  /* We currently assume that STYLE can be encoded as a single hex
>> +     character.  If more styles are added then this might start to fail,
>> +     and we'll need to expand this code.  */
>> +  if (num > 0xf)
>> +    abort ();
>
> You want to either also check for negative values or make "num" unsigned.
>
>> @@ -10789,26 +10934,27 @@ append_seg (instr_info *ins)
>>    switch (ins->active_seg_prefix)
>>      {
>>      case PREFIX_CS:
>> -      oappend_maybe_intel (ins, "%cs:");
>> +      oappend_maybe_intel_with_style (ins, "%cs", dis_style_register);
>
> I was about to ask why dis_style_register needs specifying here, but I
> notice the comment ahead of the function is misleading. There also are
> cases where a leading '$' would be skipped. I wonder though whether it
> wouldn't yield better readable code if those used a separate function,
> thus eliminating the need for the explicit style parameter. E.g.
> oappend_register() and oappend_immediate(). The "maybe_intel" part of
> the name isn't really useful imo.
>
>> @@ -13352,7 +13502,7 @@ OP_VexI4 (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>>  {
>>    ins->scratchbuf[0] = '$';
>>    print_operand_value (ins, ins->scratchbuf + 1, 1, ins->codep[-1] & 0xf);
>> -  oappend_maybe_intel (ins, ins->scratchbuf);
>> +  oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_text);
>>  }
>>  
>>  static void
>> @@ -13397,7 +13547,7 @@ VPCMP_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>>        /* We have a reserved extension byte.  Output it directly.  */
>>        ins->scratchbuf[0] = '$';
>>        print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
>> -      oappend_maybe_intel (ins, ins->scratchbuf);
>> +      oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_text);
>>        ins->scratchbuf[0] = '\0';
>>      }
>>  }
>> @@ -13449,7 +13599,7 @@ VPCOM_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>>        /* We have a reserved extension byte.  Output it directly.  */
>>        ins->scratchbuf[0] = '$';
>>        print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
>> -      oappend_maybe_intel (ins, ins->scratchbuf);
>> +      oappend_maybe_intel_with_style (ins, ins->scratchbuf, dis_style_text);
>>        ins->scratchbuf[0] = '\0';
>>      }
>>  }
>
> Why "text" for these three immediates, but ...
>
>> @@ -13497,7 +13647,8 @@ PCLMUL_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>>        /* We have a reserved extension byte.  Output it directly.  */
>>        ins->scratchbuf[0] = '$';
>>        print_operand_value (ins, ins->scratchbuf + 1, 1, pclmul_type);
>> -      oappend_maybe_intel (ins, ins->scratchbuf);
>> +      oappend_maybe_intel_with_style (ins, ins->scratchbuf,
>> +				      dis_style_immediate);
>>        ins->scratchbuf[0] = '\0';
>>      }
>>  }
>
> ... "immediate" here?
>
> Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* [PATCHv3] libopcodes: extend the styling within the i386 disassembler
  2022-05-09 12:54           ` [PATCHv2] " Andrew Burgess
  2022-05-18 12:27             ` Jan Beulich
  2022-05-18 21:23             ` H.J. Lu
@ 2022-05-27 17:44             ` Andrew Burgess
  2022-05-30  8:19               ` Jan Beulich
  2022-06-10 10:56               ` Jan Beulich
  2 siblings, 2 replies; 29+ messages in thread
From: Andrew Burgess @ 2022-05-27 17:44 UTC (permalink / raw)
  To: binutils; +Cc: Andrew Burgess

In patch v3:

  - Removed use of printf function pointer in instr_info for printing,
    we now call i386_dis_printf directly.

  - The strange code for emitting whitespace has now been removed
    thanks to the recently merged commit 202be274a41.

  - Made 'num' unsigned in oappend_insert_style.

  - Added oappend_register and oappend_immediate functions, these wrap
    around oappend_maybe_intel_with_style, but suppy the appropriate
    style, started using these throughout.  There are a few cases
    where these are not used, but for these cases the original code
    didn't go through the 'maybe_intel' logic, so instead we just call
    oappend_with_style and pass the register/immediate style as
    needed.

  - Fixed a bunch of missed register/immediate styling by switching to
    the use of oappend_register/oappend_immediate, and by following up
    on Jan's and H.J.'s feedback.

  - The oappend_char function now defaults to dis_style_text, as that
    was the only style ever used with that function.

In patch v2:

  - Addressed all minor feedback items from Vladimir, H.J. and Jan,

  - Switched to using \002 as the styling escape character,

  - Escape character is defined once near the top of i386-dis.c making
    it easy to switch to a different character if needed,

  - Detection of the style escape character is stricter in
    i386_dis_printf,

  - Proper error handling in i386_dis_printf, though I can't imagine
    when this would actually trigger.

---

The i386 disassembler is pretty complex.  Most disassembly is done
indirectly; operands are built into buffers within a struct instr_info
instance, before finally being printed later in the disassembly
process.

Sometimes the operand buffers are built in a different order to the
order in which they will eventually be printed.

Each operand can contain multiple components, e.g. multiple registers,
immediates, other textual elements (commas, brackets, etc).

When looking for how to apply styling I guess the ideal solution would
be to move away from the operands being a single string that is built
up, and instead have each operand be a list of "parts", where each
part is some text and a style.  Then, when we eventually print the
operand we would loop over the parts and print each part with the
correct style.

But it feels like a huge amount of work to move from where we are
now to that potentially ideal solution.  Plus, the above solution
would be pretty complex.

So, instead I propose a .... different solution here, one that works
with the existing infrastructure.

As each operand is built up, piece be piece, we pass through style
information.  This style information is then encoded into the operand
buffer (see below for details).  After this the code can continue to
operate as it does right now in order to manage the set of operand
buffers.

Then, as each operand is printed we can split the operand buffer into
chunks at the style marker boundaries, with each chunk being printed
with the correct style.

For encoding the style information I use a single character, currently
\002, followed by the style encoded as a single hex digit, followed
again by the \002 character.

This of course relies on there not being more than 16 styles, but that
is currently true, and hopefully will remain true for the foreseeable
future.

The other major concern that has arisen around this work is whether
the escape character could ever be encountered in output naturally
generated by the disassembler.  If this did happen then the escape
characters would be stripped from the output, and the wrong styling
would be applied.

However, I don't believe that this is currently a problem.
Disassembler content comes from a number of sources.  First there's
content that copied directly from the i386-dis.c file, this is things
like register names, and other syntax elements (brackets, commas,
etc).  We can easily check that the i386-dis.c file doesn't contain
our special character.

The next source of content are immediate operands.  The text for these
operands is generated by calls into libc.  By selecting a
non-printable character we can be confident that this is not something
that libc will generate as part of an immediate representation.

The other output that appears to be from the disassembler is operands
that contain addresses and (possibly) symbol names.  It is quite
possible that a symbol name might contain any special character we
could imagine, so is this a problem?

I don't think it is, we don't actually print address and symbol
operands through the disassembler, instead, the disassembler calls
back to the user (objdump, gdb, etc) to print the address and symbol
on its behalf.  This content is printed directly to the output stream,
it does not pass through the i386 disassembler output buffers.  As a
result, we never check this particular output for styling escape
characters.

In some (not very scientific) benchmarking on my machine,
disassembling a reasonably large (142M) shared library, I'm not seeing
any significant slow down in disassembler speed with this change.

Most instructions are now being fully syntax highlighted when I
disassemble using the --disassembler-color=extended-color option.  I'm
sure that there are probably still a few corner cases that need fixing
up, but we can come back to them later I think.

When disassembler syntax highlighting is not being used, then there
should be no user visible changes after this commit.
---
 opcodes/i386-dis.c | 418 ++++++++++++++++++++++++++++++---------------
 1 file changed, 282 insertions(+), 136 deletions(-)

diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 7b99969b239..f66e374d79b 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -47,6 +47,8 @@ static void dofloat (instr_info *, int);
 static void OP_ST (instr_info *, int, int);
 static void OP_STi (instr_info *, int, int);
 static int putop (instr_info *, const char *, int);
+static void oappend_with_style (instr_info *, const char *,
+				enum disassembler_style);
 static void oappend (instr_info *, const char *);
 static void append_seg (instr_info *);
 static void OP_indirE (instr_info *, int, int);
@@ -116,6 +118,10 @@ static void FXSAVE_Fixup (instr_info *, int, int);
 static void MOVSXD_Fixup (instr_info *, int, int);
 static void DistinctDest_Fixup (instr_info *, int, int);
 
+/* This character is used to encode style information within the output
+   buffers.  See oappend_insert_style for more details.  */
+#define STYLE_MARKER_CHAR '\002'
+
 struct dis_private {
   /* Points to first byte not fetched.  */
   bfd_byte *max_fetched;
@@ -247,7 +253,6 @@ struct instr_info
   char scale_char;
 
   enum x86_64_isa isa64;
-
 };
 
 /* Mark parts used in the REX prefix.  When we are testing for
@@ -9299,11 +9304,117 @@ get_sib (instr_info *ins, int sizeflag)
 }
 
 /* Like oappend (below), but S is a string starting with '%'.
-   In Intel syntax, the '%' is elided.  */
+   In Intel syntax, the '%' is elided.  STYLE is used when displaying this
+   part of the output in the disassembler.
+
+   This function should not be used directly from the general disassembler
+   code, instead the helpers oappend_register and oappend_immediate should
+   be called as appropriate.  */
+
+static void
+oappend_maybe_intel_with_style (instr_info *ins, const char *s,
+				enum disassembler_style style)
+{
+  oappend_with_style (ins, s + ins->intel_syntax, style);
+}
+
+/* Like oappend_maybe_intel_with_style above, but called when S is the
+   name of a register.  */
+
 static void
-oappend_maybe_intel (instr_info *ins, const char *s)
+oappend_register (instr_info *ins, const char *s)
+{
+  oappend_maybe_intel_with_style (ins, s, dis_style_register);
+}
+
+/* Like oappend_maybe_intel_with_style above, but called when S represents
+   an immediate.  */
+
+static void
+oappend_immediate (instr_info *ins, const char *s)
+{
+  oappend_maybe_intel_with_style (ins, s, dis_style_immediate);
+}
+
+/* Wrap around a call to INS->info->fprintf_styled_func, printing FMT.
+   STYLE is the default style to use in the fprintf_styled_func calls,
+   however, FMT might include embedded style markers (see oappend_style),
+   these embedded markers are not printed, but instead change the style
+   used in the next fprintf_styled_func call.
+
+   Return non-zero to indicate the print call was a success.  */
+
+static int ATTRIBUTE_PRINTF_3
+i386_dis_printf (instr_info *ins, enum disassembler_style style,
+		 const char *fmt, ...)
 {
-  oappend (ins, s + ins->intel_syntax);
+  va_list ap;
+  enum disassembler_style curr_style = style;
+  char *start, *curr;
+  char staging_area[100];
+  int res;
+
+  va_start (ap, fmt);
+  res = vsnprintf (staging_area, sizeof (staging_area), fmt, ap);
+  va_end (ap);
+
+  if (res < 0)
+    return res;
+
+  start = curr = staging_area;
+
+  do
+    {
+      if (*curr == '\0'
+	  || (*curr == STYLE_MARKER_CHAR
+	      && ISXDIGIT (*(curr + 1))
+	      && *(curr + 2) == STYLE_MARKER_CHAR))
+	{
+	  /* Output content between our START position and CURR.  */
+	  int len = curr - start;
+	  int n = (*ins->info->fprintf_styled_func) (ins->info->stream,
+						     curr_style,
+						     "%.*s", len, start);
+	  if (n < 0)
+	    {
+	      res = n;
+	      break;
+	    }
+
+	  if (*curr == '\0')
+	    break;
+
+	  /* Skip over the initial STYLE_MARKER_CHAR.  */
+	  ++curr;
+
+	  /* Update the CURR_STYLE.  As there are less than 16 styles, it
+	     is possible, that if the input is corrupted in some way, that
+	     we might set CURR_STYLE to an invalid value.  Don't worry
+	     though, we check for this situation.  */
+	  if (*curr >= '0' && *curr <= '9')
+	    curr_style = (enum disassembler_style) (*curr - '0');
+	  else if (*curr >= 'a' && *curr <= 'f')
+	    curr_style = (enum disassembler_style) (*curr - 'a' + 10);
+	  else
+	    curr_style = dis_style_text;
+
+	  /* Check for an invalid style having been selected.  This should
+	     never happen, but it doesn't hurt to be a little paranoid.  */
+	  if (curr_style > dis_style_comment_start)
+	    curr_style = dis_style_text;
+
+	  /* Skip the hex character, and the closing STYLE_MARKER_CHAR.  */
+	  curr += 2;
+
+	  /* Reset the START to after the style marker.  */
+	  start = curr;
+	}
+      else
+	++curr;
+    }
+  while (true);
+
+  return res;
 }
 
 static int
@@ -9404,8 +9515,7 @@ print_insn (bfd_vma pc, instr_info *ins)
 
   if (ins->address_mode == mode_64bit && sizeof (bfd_vma) < 8)
     {
-      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
-					 _("64-bit address is disabled"));
+      i386_dis_printf (ins, dis_style_text, _("64-bit address is disabled"));
       return -1;
     }
 
@@ -9454,16 +9564,14 @@ print_insn (bfd_vma pc, instr_info *ins)
 	{
 	  name = prefix_name (ins, priv.the_buffer[0], priv.orig_sizeflag);
 	  if (name != NULL)
-	    (*ins->info->fprintf_styled_func)
-	      (ins->info->stream, dis_style_mnemonic, "%s", name);
+	    i386_dis_printf (ins, dis_style_mnemonic, "%s", name);
 	  else
 	    {
 	      /* Just print the first byte as a .byte instruction.  */
-	      (*ins->info->fprintf_styled_func)
-		(ins->info->stream, dis_style_assembler_directive, ".byte ");
-	      (*ins->info->fprintf_styled_func)
-		(ins->info->stream, dis_style_immediate, "0x%x",
-		 (unsigned int) priv.the_buffer[0]);
+	      i386_dis_printf (ins, dis_style_assembler_directive,
+			       ".byte ");
+	      i386_dis_printf (ins, dis_style_immediate, "0x%x",
+			       (unsigned int) priv.the_buffer[0]);
 	    }
 
 	  return 1;
@@ -9481,10 +9589,9 @@ print_insn (bfd_vma pc, instr_info *ins)
       for (i = 0;
 	   i < (int) ARRAY_SIZE (ins->all_prefixes) && ins->all_prefixes[i];
 	   i++)
-	(*ins->info->fprintf_styled_func)
-	  (ins->info->stream, dis_style_mnemonic, "%s%s",
-	   (i == 0 ? "" : " "), prefix_name (ins, ins->all_prefixes[i],
-					     sizeflag));
+	i386_dis_printf (ins, dis_style_mnemonic, "%s%s",
+			 (i == 0 ? "" : " "),
+			 prefix_name (ins, ins->all_prefixes[i], sizeflag));
       return i;
     }
 
@@ -9499,11 +9606,9 @@ print_insn (bfd_vma pc, instr_info *ins)
       /* Handle ins->prefixes before fwait.  */
       for (i = 0; i < ins->fwait_prefix && ins->all_prefixes[i];
 	   i++)
-	(*ins->info->fprintf_styled_func)
-	  (ins->info->stream, dis_style_mnemonic, "%s ",
-	   prefix_name (ins, ins->all_prefixes[i], sizeflag));
-      (*ins->info->fprintf_styled_func)
-	(ins->info->stream, dis_style_mnemonic, "fwait");
+	i386_dis_printf (ins, dis_style_mnemonic, "%s ",
+			 prefix_name (ins, ins->all_prefixes[i], sizeflag));
+      i386_dis_printf (ins, dis_style_mnemonic, "fwait");
       return i + 1;
     }
 
@@ -9572,10 +9677,10 @@ print_insn (bfd_vma pc, instr_info *ins)
 		  /* Don't print {%k0}.  */
 		  if (ins->vex.mask_register_specifier)
 		    {
+		      const char *reg_name
+			= att_names_mask[ins->vex.mask_register_specifier];
 		      oappend (ins, "{");
-		      oappend_maybe_intel (ins,
-					   att_names_mask
-					   [ins->vex.mask_register_specifier]);
+		      oappend_register (ins, reg_name);
 		      oappend (ins, "}");
 		    }
 		  if (ins->vex.zeroing)
@@ -9652,16 +9757,14 @@ print_insn (bfd_vma pc, instr_info *ins)
      are all 0s in inverted form.  */
   if (ins->need_vex && ins->vex.register_specifier != 0)
     {
-      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
-					 "(bad)");
+      i386_dis_printf (ins, dis_style_text, "(bad)");
       return ins->end_codep - priv.the_buffer;
     }
 
   /* If EVEX.z is set, there must be an actual mask register in use.  */
   if (ins->vex.zeroing && ins->vex.mask_register_specifier == 0)
     {
-      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
-					 "(bad)");
+      i386_dis_printf (ins, dis_style_text, "(bad)");
       return ins->end_codep - priv.the_buffer;
     }
 
@@ -9672,8 +9775,7 @@ print_insn (bfd_vma pc, instr_info *ins)
 	 the encoding invalid.  Most other PREFIX_OPCODE rules still apply.  */
       if (ins->need_vex ? !ins->vex.prefix : !(ins->prefixes & PREFIX_DATA))
 	{
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, "(bad)");
+	  i386_dis_printf (ins, dis_style_text, "(bad)");
 	  return ins->end_codep - priv.the_buffer;
 	}
       ins->used_prefixes |= PREFIX_DATA;
@@ -9700,8 +9802,7 @@ print_insn (bfd_vma pc, instr_info *ins)
 	  || (ins->vex.evex && dp->prefix_requirement != PREFIX_DATA
 	      && !ins->vex.w != !(ins->used_prefixes & PREFIX_DATA)))
 	{
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, "(bad)");
+	  i386_dis_printf (ins, dis_style_text, "(bad)");
 	  return ins->end_codep - priv.the_buffer;
 	}
       break;
@@ -9751,15 +9852,13 @@ print_insn (bfd_vma pc, instr_info *ins)
 	if (name == NULL)
 	  abort ();
 	prefix_length += strlen (name) + 1;
-	(*ins->info->fprintf_styled_func)
-	  (ins->info->stream, dis_style_mnemonic, "%s ", name);
+	i386_dis_printf (ins, dis_style_mnemonic, "%s ", name);
       }
 
   /* Check maximum code length.  */
   if ((ins->codep - ins->start_codep) > MAX_CODE_LENGTH)
     {
-      (*ins->info->fprintf_styled_func)
-	(ins->info->stream, dis_style_text, "(bad)");
+      i386_dis_printf (ins, dis_style_text, "(bad)");
       return MAX_CODE_LENGTH;
     }
 
@@ -9783,8 +9882,7 @@ print_insn (bfd_vma pc, instr_info *ins)
     i = 0;
 
   /* Print the instruction mnemonic along with any trailing whitespace.  */
-  (*ins->info->fprintf_styled_func)
-    (ins->info->stream, dis_style_mnemonic, "%s%*s", ins->obuf, i, "");
+  i386_dis_printf (ins, dis_style_mnemonic, "%s%*s", ins->obuf, i, "");
 
   /* The enter and bound instructions are printed with operands in the same
      order as the intel book; everything else is printed in reverse order.  */
@@ -9839,8 +9937,7 @@ print_insn (bfd_vma pc, instr_info *ins)
 	    break;
 	  }
 	if (needcomma)
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, ",");
+	  i386_dis_printf (ins, dis_style_text, ",");
 	if (ins->op_index[i] != -1 && !ins->op_riprel[i])
 	  {
 	    bfd_vma target = (bfd_vma) ins->op_address[ins->op_index[i]];
@@ -9856,18 +9953,14 @@ print_insn (bfd_vma pc, instr_info *ins)
 	    (*ins->info->print_address_func) (target, ins->info);
 	  }
 	else
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, "%s",
-					     op_txt[i]);
+	  i386_dis_printf (ins, dis_style_text, "%s", op_txt[i]);
 	needcomma = 1;
       }
 
   for (i = 0; i < MAX_OPERANDS; i++)
     if (ins->op_index[i] != -1 && ins->op_riprel[i])
       {
-	(*ins->info->fprintf_styled_func) (ins->info->stream,
-					   dis_style_comment_start,
-					   "        # ");
+	i386_dis_printf (ins, dis_style_comment_start, "        # ");
 	(*ins->info->print_address_func) ((bfd_vma)
 			(ins->start_pc + (ins->codep - ins->start_codep)
 			 + ins->op_address[ins->op_index[i]]), ins->info);
@@ -10252,7 +10345,7 @@ static void
 OP_ST (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
        int sizeflag ATTRIBUTE_UNUSED)
 {
-  oappend_maybe_intel (ins, "%st");
+  oappend_register (ins, "%st");
 }
 
 static void
@@ -10260,7 +10353,7 @@ OP_STi (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
 	int sizeflag ATTRIBUTE_UNUSED)
 {
   sprintf (ins->scratchbuf, "%%st(%d)", ins->modrm.rm);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_register (ins, ins->scratchbuf);
 }
 
 /* Capital letters in template are macros.  */
@@ -10807,12 +10900,73 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
   return 0;
 }
 
+/* Add a style marker to *INS->obufp that encodes STYLE.  This assumes that
+   the buffer pointed to by INS->obufp has space.  A style marker is made
+   from the STYLE_MARKER_CHAR followed by STYLE converted to a single hex
+   digit, followed by another STYLE_MARKER_CHAR.  This function assumes
+   that the number of styles is not greater than 16.  */
+
 static void
-oappend (instr_info *ins, const char *s)
+oappend_insert_style (instr_info *ins, enum disassembler_style style)
 {
+  unsigned num = (unsigned) style;
+
+  /* We currently assume that STYLE can be encoded as a single hex
+     character.  If more styles are added then this might start to fail,
+     and we'll need to expand this code.  */
+  if (num > 0xf)
+    abort ();
+
+  *ins->obufp++ = STYLE_MARKER_CHAR;
+  *ins->obufp++ = (num < 10 ? ('0' + num)
+		   : ((num < 16) ? ('a' + (num - 10)) : '0'));
+  *ins->obufp++ = STYLE_MARKER_CHAR;
+
+  /* This final null character is not strictly necessary, after inserting a
+     style marker we should always be inserting some additional content.
+     However, having the buffer null terminated doesn't cost much, and make
+     it easier to debug what's going on.  Also, if we do ever forget to add
+     any additional content after this style marker, then the buffer will
+     still be well formed.  */
+  *ins->obufp = '\0';
+}
+
+static void
+oappend_with_style (instr_info *ins, const char *s,
+		    enum disassembler_style style)
+{
+  oappend_insert_style (ins, style);
   ins->obufp = stpcpy (ins->obufp, s);
 }
 
+/* Like oappend_with_style but always with text style.  */
+
+static void
+oappend (instr_info *ins, const char *s)
+{
+  oappend_with_style (ins, s, dis_style_text);
+}
+
+/* Add a single character C to the buffer pointer to by INS->obufp, marking
+   the style for the character as STYLE.  */
+
+static void
+oappend_char_with_style (instr_info *ins, const char c,
+			 enum disassembler_style style)
+{
+  oappend_insert_style (ins, style);
+  *ins->obufp++ = c;
+  *ins->obufp = '\0';
+}
+
+/* Like oappend_char_with_style, but always uses dis_style_text.  */
+
+static void
+oappend_char (instr_info *ins, const char c)
+{
+  oappend_char_with_style (ins, c, dis_style_text);
+}
+
 static void
 append_seg (instr_info *ins)
 {
@@ -10824,26 +10978,27 @@ append_seg (instr_info *ins)
   switch (ins->active_seg_prefix)
     {
     case PREFIX_CS:
-      oappend_maybe_intel (ins, "%cs:");
+      oappend_register (ins, "%cs");
       break;
     case PREFIX_DS:
-      oappend_maybe_intel (ins, "%ds:");
+      oappend_register (ins, "%ds");
       break;
     case PREFIX_SS:
-      oappend_maybe_intel (ins, "%ss:");
+      oappend_register (ins, "%ss");
       break;
     case PREFIX_ES:
-      oappend_maybe_intel (ins, "%es:");
+      oappend_register (ins, "%es");
       break;
     case PREFIX_FS:
-      oappend_maybe_intel (ins, "%fs:");
+      oappend_register (ins, "%fs");
       break;
     case PREFIX_GS:
-      oappend_maybe_intel (ins, "%gs:");
+      oappend_register (ins, "%gs");
       break;
     default:
       break;
     }
+  oappend_char (ins, ':');
 }
 
 static void
@@ -11331,7 +11486,7 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
       oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
       return;
     }
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_register (ins, names[reg]);
 }
 
 static void
@@ -11595,11 +11750,15 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	      print_displacement (ins, ins->scratchbuf, disp);
 	    else
 	      print_operand_value (ins, ins->scratchbuf, 1, disp);
-	    oappend (ins, ins->scratchbuf);
+	    oappend_with_style (ins, ins->scratchbuf,
+				dis_style_address_offset);
 	    if (riprel)
 	      {
 		set_op (ins, disp, true);
-		oappend (ins, !addr32flag ? "(%rip)" : "(%eip)");
+		oappend_char (ins, '(');
+		oappend_with_style (ins, !addr32flag ? "%rip" : "%eip",
+				    dis_style_register);
+		oappend_char (ins, ')');
 	      }
 	  }
 
@@ -11613,17 +11772,18 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
       if (havedisp || (ins->intel_syntax && riprel))
 	{
-	  *ins->obufp++ = ins->open_char;
+	  oappend_char (ins, ins->open_char);
 	  if (ins->intel_syntax && riprel)
 	    {
 	      set_op (ins, disp, true);
-	      oappend (ins, !addr32flag ? "rip" : "eip");
+	      oappend_with_style (ins, !addr32flag ? "rip" : "eip",
+				  dis_style_register);
 	    }
-	  *ins->obufp = '\0';
 	  if (havebase)
-	    oappend_maybe_intel (ins,
-				 (ins->address_mode == mode_64bit && !addr32flag
-				  ? att_names64 : att_names32)[rbase]);
+	    oappend_register
+	      (ins,
+	       (ins->address_mode == mode_64bit && !addr32flag
+		? att_names64 : att_names32)[rbase]);
 	  if (ins->has_sib)
 	    {
 	      /* ESP/RSP won't allow index.  If base isn't ESP/RSP,
@@ -11634,41 +11794,35 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 		  || (havebase && base != ESP_REG_NUM))
 		{
 		  if (!ins->intel_syntax || havebase)
-		    {
-		      *ins->obufp++ = ins->separator_char;
-		      *ins->obufp = '\0';
-		    }
+		    oappend_char (ins, ins->separator_char);
 		  if (indexes)
 		    {
 		      if (ins->address_mode == mode_64bit || vindex < 16)
-			oappend_maybe_intel (ins, indexes[vindex]);
+			oappend_register (ins, indexes[vindex]);
 		      else
 			oappend (ins, "(bad)");
 		    }
 		  else
-		    oappend_maybe_intel (ins,
-					 ins->address_mode == mode_64bit
-					 && !addr32flag ? att_index64
-							: att_index32);
+		    oappend_register (ins,
+				      ins->address_mode == mode_64bit
+				      && !addr32flag
+				      ? att_index64
+				      : att_index32);
 
-		  *ins->obufp++ = ins->scale_char;
-		  *ins->obufp = '\0';
+		  oappend_char (ins, ins->scale_char);
 		  sprintf (ins->scratchbuf, "%d", 1 << scale);
-		  oappend (ins, ins->scratchbuf);
+		  oappend_with_style (ins, ins->scratchbuf,
+				      dis_style_immediate);
 		}
 	    }
 	  if (ins->intel_syntax
 	      && (disp || ins->modrm.mod != 0 || base == 5))
 	    {
 	      if (!havedisp || (bfd_signed_vma) disp >= 0)
-		{
-		  *ins->obufp++ = '+';
-		  *ins->obufp = '\0';
-		}
+		  oappend_char (ins, '+');
 	      else if (ins->modrm.mod != 1 && disp != -disp)
 		{
-		  *ins->obufp++ = '-';
-		  *ins->obufp = '\0';
+		  oappend_char (ins, '-');
 		  disp = -disp;
 		}
 
@@ -11679,8 +11833,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	      oappend (ins, ins->scratchbuf);
 	    }
 
-	  *ins->obufp++ = ins->close_char;
-	  *ins->obufp = '\0';
+	  oappend_char (ins, ins->close_char);
 
 	  if (check_gather)
 	    {
@@ -11701,7 +11854,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	    {
 	      if (!ins->active_seg_prefix)
 		{
-		  oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
+		  oappend_register (ins, att_names_seg[ds_reg - es_reg]);
 		  oappend (ins, ":");
 		}
 	      print_operand_value (ins, ins->scratchbuf, 1, disp);
@@ -11757,23 +11910,17 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
       if (ins->modrm.mod != 0 || ins->modrm.rm != 6)
 	{
-	  *ins->obufp++ = ins->open_char;
-	  *ins->obufp = '\0';
-	  oappend (ins,
-		   (ins->intel_syntax ? intel_index16
-				      : att_index16)[ins->modrm.rm]);
+	  oappend_char (ins, ins->open_char);
+	  oappend (ins, (ins->intel_syntax ? intel_index16
+			 : att_index16)[ins->modrm.rm]);
 	  if (ins->intel_syntax
 	      && (disp || ins->modrm.mod != 0 || ins->modrm.rm == 6))
 	    {
 	      if ((bfd_signed_vma) disp >= 0)
-		{
-		  *ins->obufp++ = '+';
-		  *ins->obufp = '\0';
-		}
+		oappend_char (ins, '+');
 	      else if (ins->modrm.mod != 1)
 		{
-		  *ins->obufp++ = '-';
-		  *ins->obufp = '\0';
+		  oappend_char (ins, '-');
 		  disp = -disp;
 		}
 
@@ -11781,14 +11928,13 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	      oappend (ins, ins->scratchbuf);
 	    }
 
-	  *ins->obufp++ = ins->close_char;
-	  *ins->obufp = '\0';
+	  oappend_char (ins, ins->close_char);
 	}
       else if (ins->intel_syntax)
 	{
 	  if (!ins->active_seg_prefix)
 	    {
-	      oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
+	      oappend_register (ins, att_names_seg[ds_reg - es_reg]);
 	      oappend (ins, ":");
 	    }
 	  print_operand_value (ins, ins->scratchbuf, 1, disp & 0xffff);
@@ -11999,7 +12145,7 @@ OP_REG (instr_info *ins, int code, int sizeflag)
     {
     case es_reg: case ss_reg: case cs_reg:
     case ds_reg: case fs_reg: case gs_reg:
-      oappend_maybe_intel (ins, att_names_seg[code - es_reg]);
+      oappend_register (ins, att_names_seg[code - es_reg]);
       return;
     }
 
@@ -12052,7 +12198,7 @@ OP_REG (instr_info *ins, int code, int sizeflag)
       oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
       return;
     }
-  oappend_maybe_intel (ins, s);
+  oappend_register (ins, s);
 }
 
 static void
@@ -12093,7 +12239,7 @@ OP_IMREG (instr_info *ins, int code, int sizeflag)
       oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
       return;
     }
-  oappend_maybe_intel (ins, s);
+  oappend_register (ins, s);
 }
 
 static void
@@ -12148,7 +12294,7 @@ OP_I (instr_info *ins, int bytemode, int sizeflag)
   op &= mask;
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, op);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_immediate (ins, ins->scratchbuf);
   ins->scratchbuf[0] = '\0';
 }
 
@@ -12166,7 +12312,7 @@ OP_I64 (instr_info *ins, int bytemode, int sizeflag)
 
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, get64 (ins));
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_immediate (ins, ins->scratchbuf);
   ins->scratchbuf[0] = '\0';
 }
 
@@ -12220,7 +12366,7 @@ OP_sI (instr_info *ins, int bytemode, int sizeflag)
 
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, op);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_immediate (ins, ins->scratchbuf);
 }
 
 static void
@@ -12278,7 +12424,7 @@ static void
 OP_SEG (instr_info *ins, int bytemode, int sizeflag)
 {
   if (bytemode == w_mode)
-    oappend_maybe_intel (ins, att_names_seg[ins->modrm.reg]);
+    oappend_register (ins, att_names_seg[ins->modrm.reg]);
   else
     OP_E (ins, ins->modrm.mod == 3 ? bytemode : w_mode, sizeflag);
 }
@@ -12324,12 +12470,12 @@ OP_OFF (instr_info *ins, int bytemode, int sizeflag)
     {
       if (!ins->active_seg_prefix)
 	{
-	  oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
+	  oappend_register (ins, att_names_seg[ds_reg - es_reg]);
 	  oappend (ins, ":");
 	}
     }
   print_operand_value (ins, ins->scratchbuf, 1, off);
-  oappend (ins, ins->scratchbuf);
+  oappend_with_style (ins, ins->scratchbuf, dis_style_address_offset);
 }
 
 static void
@@ -12354,12 +12500,12 @@ OP_OFF64 (instr_info *ins, int bytemode, int sizeflag)
     {
       if (!ins->active_seg_prefix)
 	{
-	  oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
+	  oappend_register (ins, att_names_seg[ds_reg - es_reg]);
 	  oappend (ins, ":");
 	}
     }
   print_operand_value (ins, ins->scratchbuf, 1, off);
-  oappend (ins, ins->scratchbuf);
+  oappend_with_style (ins, ins->scratchbuf, dis_style_address_offset);
 }
 
 static void
@@ -12380,9 +12526,8 @@ ptr_reg (instr_info *ins, int code, int sizeflag)
     s = att_names32[code - eAX_reg];
   else
     s = att_names16[code - eAX_reg];
-  oappend_maybe_intel (ins, s);
-  *ins->obufp++ = ins->close_char;
-  *ins->obufp = 0;
+  oappend_register (ins, s);
+  oappend_char (ins, ins->close_char);
 }
 
 static void
@@ -12405,7 +12550,8 @@ OP_ESreg (instr_info *ins, int code, int sizeflag)
 	  intel_operand_size (ins, b_mode, sizeflag);
 	}
     }
-  oappend_maybe_intel (ins, "%es:");
+  oappend_register (ins, "%es");
+  oappend_char (ins, ':');
   ptr_reg (ins, code, sizeflag);
 }
 
@@ -12455,7 +12601,7 @@ OP_C (instr_info *ins, int dummy ATTRIBUTE_UNUSED,
   else
     add = 0;
   sprintf (ins->scratchbuf, "%%cr%d", ins->modrm.reg + add);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_register (ins, ins->scratchbuf);
 }
 
 static void
@@ -12480,7 +12626,7 @@ OP_T (instr_info *ins, int dummy ATTRIBUTE_UNUSED,
       int sizeflag ATTRIBUTE_UNUSED)
 {
   sprintf (ins->scratchbuf, "%%tr%d", ins->modrm.reg);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_register (ins, ins->scratchbuf);
 }
 
 static void
@@ -12500,7 +12646,7 @@ OP_MMX (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
     }
   else
     names = att_names_mm;
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_register (ins, names[reg]);
 }
 
 static void
@@ -12575,7 +12721,7 @@ print_vector_reg (instr_info *ins, unsigned int reg, int bytemode)
     }
   else
     names = att_names_xmm;
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_register (ins, names[reg]);
 }
 
 static void
@@ -12635,7 +12781,7 @@ OP_EM (instr_info *ins, int bytemode, int sizeflag)
     }
   else
     names = att_names_mm;
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_register (ins, names[reg]);
 }
 
 /* cvt* are the only instructions in sse2 which have
@@ -12661,7 +12807,7 @@ OP_EMC (instr_info *ins, int bytemode, int sizeflag)
   MODRM_CHECK;
   ins->codep++;
   ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
-  oappend_maybe_intel (ins, att_names_mm[ins->modrm.rm]);
+  oappend_register (ins, att_names_mm[ins->modrm.rm]);
 }
 
 static void
@@ -12669,7 +12815,7 @@ OP_MXC (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
 	int sizeflag ATTRIBUTE_UNUSED)
 {
   ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
-  oappend_maybe_intel (ins, att_names_mm[ins->modrm.reg]);
+  oappend_register (ins, att_names_mm[ins->modrm.reg]);
 }
 
 static void
@@ -12845,7 +12991,7 @@ OP_3DNowSuffix (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
   ins->obufp = ins->mnemonicendp;
   mnemonic = Suffix3DNow[*ins->codep++ & 0xff];
   if (mnemonic)
-    oappend (ins, mnemonic);
+    ins->obufp = stpcpy (ins->obufp, mnemonic);
   else
     {
       /* Since a variable sized ins->modrm/ins->sib chunk is between the start
@@ -12934,7 +13080,7 @@ CMP_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
       /* We have a reserved extension byte.  Output it directly.  */
       ins->scratchbuf[0] = '$';
       print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
-      oappend_maybe_intel (ins, ins->scratchbuf);
+      oappend_immediate (ins, ins->scratchbuf);
       ins->scratchbuf[0] = '\0';
     }
 }
@@ -12991,7 +13137,7 @@ BadOp (instr_info *ins)
 {
   /* Throw away prefixes and 1st. opcode byte.  */
   ins->codep = ins->insn_codep + 1;
-  oappend (ins, "(bad)");
+  ins->obufp = stpcpy (ins->obufp, "(bad)");
 }
 
 static void
@@ -13155,7 +13301,7 @@ XMM_Fixup (instr_info *ins, int reg, int sizeflag ATTRIBUTE_UNUSED)
 	  abort ();
 	}
     }
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_register (ins, names[reg]);
 }
 
 static void
@@ -13204,7 +13350,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
   switch (bytemode)
     {
     case scalar_mode:
-      oappend_maybe_intel (ins, att_names_xmm[reg]);
+      oappend_register (ins, att_names_xmm[reg]);
       return;
 
     case vex_vsib_d_w_dq_mode:
@@ -13215,9 +13361,9 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       if (ins->vex.length == 128
 	  || (bytemode != vex_vsib_d_w_dq_mode
 	      && !ins->vex.w))
-	oappend_maybe_intel (ins, att_names_xmm[reg]);
+	oappend_register (ins, att_names_xmm[reg]);
       else
-	oappend_maybe_intel (ins, att_names_ymm[reg]);
+	oappend_register (ins, att_names_ymm[reg]);
 
       /* All 3 XMM/YMM registers must be distinct.  */
       modrm_reg = ins->modrm.reg;
@@ -13249,7 +13395,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	  /* This must be the 3rd operand.  */
 	  if (ins->obufp != ins->op_out[2])
 	    abort ();
-	  oappend_maybe_intel (ins, att_names_tmm[reg]);
+	  oappend_register (ins, att_names_tmm[reg]);
 	  if (reg == ins->modrm.reg || reg == ins->modrm.rm)
 	    strcpy (ins->obufp, "/(bad)");
 	}
@@ -13327,7 +13473,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       abort ();
       break;
     }
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_register (ins, names[reg]);
 }
 
 static void
@@ -13370,7 +13516,7 @@ OP_REG_VexI4 (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
   if (bytemode == x_mode && ins->vex.length == 256)
     names = att_names_ymm;
 
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_register (ins, names[reg]);
 
   if (ins->vex.w)
     {
@@ -13387,7 +13533,7 @@ OP_VexI4 (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
 {
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, ins->codep[-1] & 0xf);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_immediate (ins, ins->scratchbuf);
 }
 
 static void
@@ -13432,7 +13578,7 @@ VPCMP_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
       /* We have a reserved extension byte.  Output it directly.  */
       ins->scratchbuf[0] = '$';
       print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
-      oappend_maybe_intel (ins, ins->scratchbuf);
+      oappend_immediate (ins, ins->scratchbuf);
       ins->scratchbuf[0] = '\0';
     }
 }
@@ -13484,7 +13630,7 @@ VPCOM_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
       /* We have a reserved extension byte.  Output it directly.  */
       ins->scratchbuf[0] = '$';
       print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
-      oappend_maybe_intel (ins, ins->scratchbuf);
+      oappend_immediate (ins, ins->scratchbuf);
       ins->scratchbuf[0] = '\0';
     }
 }
@@ -13532,7 +13678,7 @@ PCLMUL_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
       /* We have a reserved extension byte.  Output it directly.  */
       ins->scratchbuf[0] = '$';
       print_operand_value (ins, ins->scratchbuf + 1, 1, pclmul_type);
-      oappend_maybe_intel (ins, ins->scratchbuf);
+      oappend_immediate (ins, ins->scratchbuf);
       ins->scratchbuf[0] = '\0';
     }
 }
-- 
2.25.4


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3] libopcodes: extend the styling within the i386 disassembler
  2022-05-27 17:44             ` [PATCHv3] " Andrew Burgess
@ 2022-05-30  8:19               ` Jan Beulich
  2022-05-31 17:20                 ` Andrew Burgess
  2022-06-10 10:56               ` Jan Beulich
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2022-05-30  8:19 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: binutils

On 27.05.2022 19:44, Andrew Burgess via Binutils wrote:
> @@ -9299,11 +9304,117 @@ get_sib (instr_info *ins, int sizeflag)
>  }
>  
>  /* Like oappend (below), but S is a string starting with '%'.
> -   In Intel syntax, the '%' is elided.  */
> +   In Intel syntax, the '%' is elided.  STYLE is used when displaying this
> +   part of the output in the disassembler.

As you're touching this comment anyway, can you add reference to
'$'? Or alternatively (that's what I was envisioning with the
comment on v2) drop this function altogether, doing what it does
separately in oappend_register() and oappend_immediate()?

> +   This function should not be used directly from the general disassembler
> +   code, instead the helpers oappend_register and oappend_immediate should
> +   be called as appropriate.  */
> +
> +static void
> +oappend_maybe_intel_with_style (instr_info *ins, const char *s,
> +				enum disassembler_style style)
> +{
> +  oappend_with_style (ins, s + ins->intel_syntax, style);
> +}
> +
> +/* Like oappend_maybe_intel_with_style above, but called when S is the
> +   name of a register.  */
> +
>  static void
> -oappend_maybe_intel (instr_info *ins, const char *s)
> +oappend_register (instr_info *ins, const char *s)
> +{
> +  oappend_maybe_intel_with_style (ins, s, dis_style_register);
> +}
> +
> +/* Like oappend_maybe_intel_with_style above, but called when S represents
> +   an immediate.  */
> +
> +static void
> +oappend_immediate (instr_info *ins, const char *s)
> +{
> +  oappend_maybe_intel_with_style (ins, s, dis_style_immediate);
> +}
> +
> +/* Wrap around a call to INS->info->fprintf_styled_func, printing FMT.
> +   STYLE is the default style to use in the fprintf_styled_func calls,
> +   however, FMT might include embedded style markers (see oappend_style),
> +   these embedded markers are not printed, but instead change the style
> +   used in the next fprintf_styled_func call.
> +
> +   Return non-zero to indicate the print call was a success.  */
> +
> +static int ATTRIBUTE_PRINTF_3
> +i386_dis_printf (instr_info *ins, enum disassembler_style style,
> +		 const char *fmt, ...)
>  {
> -  oappend (ins, s + ins->intel_syntax);
> +  va_list ap;
> +  enum disassembler_style curr_style = style;
> +  char *start, *curr;
> +  char staging_area[100];
> +  int res;
> +
> +  va_start (ap, fmt);
> +  res = vsnprintf (staging_area, sizeof (staging_area), fmt, ap);
> +  va_end (ap);
> +
> +  if (res < 0)
> +    return res;

Perhaps additionally assert no truncation?

Everything else looks good to me, thanks. One more question though
below.

> @@ -9404,8 +9515,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>  
>    if (ins->address_mode == mode_64bit && sizeof (bfd_vma) < 8)
>      {
> -      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
> -					 _("64-bit address is disabled"));
> +      i386_dis_printf (ins, dis_style_text, _("64-bit address is disabled"));

Just wondering: Couldn't there be an "error" style?

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3] libopcodes: extend the styling within the i386 disassembler
  2022-05-30  8:19               ` Jan Beulich
@ 2022-05-31 17:20                 ` Andrew Burgess
  2022-06-01  5:59                   ` Jan Beulich
  0 siblings, 1 reply; 29+ messages in thread
From: Andrew Burgess @ 2022-05-31 17:20 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils

Jan Beulich via Binutils <binutils@sourceware.org> writes:

> On 27.05.2022 19:44, Andrew Burgess via Binutils wrote:
>> @@ -9299,11 +9304,117 @@ get_sib (instr_info *ins, int sizeflag)
>>  }
>>  
>>  /* Like oappend (below), but S is a string starting with '%'.
>> -   In Intel syntax, the '%' is elided.  */
>> +   In Intel syntax, the '%' is elided.  STYLE is used when displaying this
>> +   part of the output in the disassembler.
>
> As you're touching this comment anyway, can you add reference to
> '$'?

Done.

>      Or alternatively (that's what I was envisioning with the
> comment on v2) drop this function altogether, doing what it does
> separately in oappend_register() and oappend_immediate()?

I didn't do this (though I will if you insist), I'd just prefer to keep
the "magic" for how we handle the intel syntax (character skipping) in a
single place.

>
>> +   This function should not be used directly from the general disassembler
>> +   code, instead the helpers oappend_register and oappend_immediate should
>> +   be called as appropriate.  */
>> +
>> +static void
>> +oappend_maybe_intel_with_style (instr_info *ins, const char *s,
>> +				enum disassembler_style style)
>> +{
>> +  oappend_with_style (ins, s + ins->intel_syntax, style);
>> +}
>> +
>> +/* Like oappend_maybe_intel_with_style above, but called when S is the
>> +   name of a register.  */
>> +
>>  static void
>> -oappend_maybe_intel (instr_info *ins, const char *s)
>> +oappend_register (instr_info *ins, const char *s)
>> +{
>> +  oappend_maybe_intel_with_style (ins, s, dis_style_register);
>> +}
>> +
>> +/* Like oappend_maybe_intel_with_style above, but called when S represents
>> +   an immediate.  */
>> +
>> +static void
>> +oappend_immediate (instr_info *ins, const char *s)
>> +{
>> +  oappend_maybe_intel_with_style (ins, s, dis_style_immediate);
>> +}
>> +
>> +/* Wrap around a call to INS->info->fprintf_styled_func, printing FMT.
>> +   STYLE is the default style to use in the fprintf_styled_func calls,
>> +   however, FMT might include embedded style markers (see oappend_style),
>> +   these embedded markers are not printed, but instead change the style
>> +   used in the next fprintf_styled_func call.
>> +
>> +   Return non-zero to indicate the print call was a success.  */
>> +
>> +static int ATTRIBUTE_PRINTF_3
>> +i386_dis_printf (instr_info *ins, enum disassembler_style style,
>> +		 const char *fmt, ...)
>>  {
>> -  oappend (ins, s + ins->intel_syntax);
>> +  va_list ap;
>> +  enum disassembler_style curr_style = style;
>> +  char *start, *curr;
>> +  char staging_area[100];
>> +  int res;
>> +
>> +  va_start (ap, fmt);
>> +  res = vsnprintf (staging_area, sizeof (staging_area), fmt, ap);
>> +  va_end (ap);
>> +
>> +  if (res < 0)
>> +    return res;
>
> Perhaps additionally assert no truncation?

Done.

>
> Everything else looks good to me, thanks. One more question though
> below.
>
>> @@ -9404,8 +9515,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>>  
>>    if (ins->address_mode == mode_64bit && sizeof (bfd_vma) < 8)
>>      {
>> -      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
>> -					 _("64-bit address is disabled"));
>> +      i386_dis_printf (ins, dis_style_text, _("64-bit address is disabled"));
>
> Just wondering: Couldn't there be an "error" style?

I've avoided an error style because I don't think the disassembler
_should_ be emitting errors like this.

I'll go so far as to say that I consider this case a bug in the i386
disassembler.

IMHO, if we pass some content to the disassembler then it should
disassemble it to something, that might just be .word or .byte
directives rather than real instructions, but we should disassemble to
something.

In the above, isn't the "error" really just a reflection that the
disassembler has been written using bfd_vma in places where either
uint64_t or int64_t would have been a better choice?

If we did decide that the assembler should be able to handle errors
other than memory errors, I think the correct solution would be to
either add (yet) another callback which is like the memory error
callback, but for different errors.  Or, modify the existing error
callback to handle different types of error maybe....

... anyway, I don't think we should do that, but I don't think we should
add an error style either as I feel it will just encourage bad behaviour
when writing the disassemblers.

Patch below includes the updates you asked for above.

Thanks,
Andrew

---

commit 4f2276d0bc3707358461fe2d3cb6cfa8378846d8
Author: Andrew Burgess <aburgess@redhat.com>
Date:   Fri Apr 22 11:23:02 2022 +0100

    libopcodes: extend the styling within the i386 disassembler
    
    The i386 disassembler is pretty complex.  Most disassembly is done
    indirectly; operands are built into buffers within a struct instr_info
    instance, before finally being printed later in the disassembly
    process.
    
    Sometimes the operand buffers are built in a different order to the
    order in which they will eventually be printed.
    
    Each operand can contain multiple components, e.g. multiple registers,
    immediates, other textual elements (commas, brackets, etc).
    
    When looking for how to apply styling I guess the ideal solution would
    be to move away from the operands being a single string that is built
    up, and instead have each operand be a list of "parts", where each
    part is some text and a style.  Then, when we eventually print the
    operand we would loop over the parts and print each part with the
    correct style.
    
    But it feels like a huge amount of work to move from where we are
    now to that potentially ideal solution.  Plus, the above solution
    would be pretty complex.
    
    So, instead I propose a .... different solution here, one that works
    with the existing infrastructure.
    
    As each operand is built up, piece be piece, we pass through style
    information.  This style information is then encoded into the operand
    buffer (see below for details).  After this the code can continue to
    operate as it does right now in order to manage the set of operand
    buffers.
    
    Then, as each operand is printed we can split the operand buffer into
    chunks at the style marker boundaries, with each chunk being printed
    with the correct style.
    
    For encoding the style information I use a single character, currently
    \002, followed by the style encoded as a single hex digit, followed
    again by the \002 character.
    
    This of course relies on there not being more than 16 styles, but that
    is currently true, and hopefully will remain true for the foreseeable
    future.
    
    The other major concern that has arisen around this work is whether
    the escape character could ever be encountered in output naturally
    generated by the disassembler.  If this did happen then the escape
    characters would be stripped from the output, and the wrong styling
    would be applied.
    
    However, I don't believe that this is currently a problem.
    Disassembler content comes from a number of sources.  First there's
    content that copied directly from the i386-dis.c file, this is things
    like register names, and other syntax elements (brackets, commas,
    etc).  We can easily check that the i386-dis.c file doesn't contain
    our special character.
    
    The next source of content are immediate operands.  The text for these
    operands is generated by calls into libc.  By selecting a
    non-printable character we can be confident that this is not something
    that libc will generate as part of an immediate representation.
    
    The other output that appears to be from the disassembler is operands
    that contain addresses and (possibly) symbol names.  It is quite
    possible that a symbol name might contain any special character we
    could imagine, so is this a problem?
    
    I don't think it is, we don't actually print address and symbol
    operands through the disassembler, instead, the disassembler calls
    back to the user (objdump, gdb, etc) to print the address and symbol
    on its behalf.  This content is printed directly to the output stream,
    it does not pass through the i386 disassembler output buffers.  As a
    result, we never check this particular output for styling escape
    characters.
    
    In some (not very scientific) benchmarking on my machine,
    disassembling a reasonably large (142M) shared library, I'm not seeing
    any significant slow down in disassembler speed with this change.
    
    Most instructions are now being fully syntax highlighted when I
    disassemble using the --disassembler-color=extended-color option.  I'm
    sure that there are probably still a few corner cases that need fixing
    up, but we can come back to them later I think.
    
    When disassembler syntax highlighting is not being used, then there
    should be no user visible changes after this commit.

diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
index 7b99969b239..f7b5e3b7319 100644
--- a/opcodes/i386-dis.c
+++ b/opcodes/i386-dis.c
@@ -47,6 +47,8 @@ static void dofloat (instr_info *, int);
 static void OP_ST (instr_info *, int, int);
 static void OP_STi (instr_info *, int, int);
 static int putop (instr_info *, const char *, int);
+static void oappend_with_style (instr_info *, const char *,
+				enum disassembler_style);
 static void oappend (instr_info *, const char *);
 static void append_seg (instr_info *);
 static void OP_indirE (instr_info *, int, int);
@@ -116,6 +118,10 @@ static void FXSAVE_Fixup (instr_info *, int, int);
 static void MOVSXD_Fixup (instr_info *, int, int);
 static void DistinctDest_Fixup (instr_info *, int, int);
 
+/* This character is used to encode style information within the output
+   buffers.  See oappend_insert_style for more details.  */
+#define STYLE_MARKER_CHAR '\002'
+
 struct dis_private {
   /* Points to first byte not fetched.  */
   bfd_byte *max_fetched;
@@ -247,7 +253,6 @@ struct instr_info
   char scale_char;
 
   enum x86_64_isa isa64;
-
 };
 
 /* Mark parts used in the REX prefix.  When we are testing for
@@ -9298,12 +9303,121 @@ get_sib (instr_info *ins, int sizeflag)
     ins->has_sib = false;
 }
 
-/* Like oappend (below), but S is a string starting with '%'.
-   In Intel syntax, the '%' is elided.  */
+/* Like oappend (below), but S is a string starting with '%' or '$'.  In
+   Intel syntax, the '%' or '$' is elided.  STYLE is used when displaying
+   this part of the output in the disassembler.
+
+   This function should not be used directly from the general disassembler
+   code, instead the helpers oappend_register and oappend_immediate should
+   be called as appropriate.  */
+
+static void
+oappend_maybe_intel_with_style (instr_info *ins, const char *s,
+				enum disassembler_style style)
+{
+  oappend_with_style (ins, s + ins->intel_syntax, style);
+}
+
+/* Like oappend_maybe_intel_with_style above, but called when S is the
+   name of a register.  */
+
 static void
-oappend_maybe_intel (instr_info *ins, const char *s)
+oappend_register (instr_info *ins, const char *s)
+{
+  oappend_maybe_intel_with_style (ins, s, dis_style_register);
+}
+
+/* Like oappend_maybe_intel_with_style above, but called when S represents
+   an immediate.  */
+
+static void
+oappend_immediate (instr_info *ins, const char *s)
+{
+  oappend_maybe_intel_with_style (ins, s, dis_style_immediate);
+}
+
+/* Wrap around a call to INS->info->fprintf_styled_func, printing FMT.
+   STYLE is the default style to use in the fprintf_styled_func calls,
+   however, FMT might include embedded style markers (see oappend_style),
+   these embedded markers are not printed, but instead change the style
+   used in the next fprintf_styled_func call.
+
+   Return non-zero to indicate the print call was a success.  */
+
+static int ATTRIBUTE_PRINTF_3
+i386_dis_printf (instr_info *ins, enum disassembler_style style,
+		 const char *fmt, ...)
 {
-  oappend (ins, s + ins->intel_syntax);
+  va_list ap;
+  enum disassembler_style curr_style = style;
+  char *start, *curr;
+  char staging_area[100];
+  int res;
+
+  va_start (ap, fmt);
+  res = vsnprintf (staging_area, sizeof (staging_area), fmt, ap);
+  va_end (ap);
+
+  if (res < 0)
+    return res;
+
+  if ((size_t) res >= sizeof (staging_area))
+    abort ();
+
+  start = curr = staging_area;
+
+  do
+    {
+      if (*curr == '\0'
+	  || (*curr == STYLE_MARKER_CHAR
+	      && ISXDIGIT (*(curr + 1))
+	      && *(curr + 2) == STYLE_MARKER_CHAR))
+	{
+	  /* Output content between our START position and CURR.  */
+	  int len = curr - start;
+	  int n = (*ins->info->fprintf_styled_func) (ins->info->stream,
+						     curr_style,
+						     "%.*s", len, start);
+	  if (n < 0)
+	    {
+	      res = n;
+	      break;
+	    }
+
+	  if (*curr == '\0')
+	    break;
+
+	  /* Skip over the initial STYLE_MARKER_CHAR.  */
+	  ++curr;
+
+	  /* Update the CURR_STYLE.  As there are less than 16 styles, it
+	     is possible, that if the input is corrupted in some way, that
+	     we might set CURR_STYLE to an invalid value.  Don't worry
+	     though, we check for this situation.  */
+	  if (*curr >= '0' && *curr <= '9')
+	    curr_style = (enum disassembler_style) (*curr - '0');
+	  else if (*curr >= 'a' && *curr <= 'f')
+	    curr_style = (enum disassembler_style) (*curr - 'a' + 10);
+	  else
+	    curr_style = dis_style_text;
+
+	  /* Check for an invalid style having been selected.  This should
+	     never happen, but it doesn't hurt to be a little paranoid.  */
+	  if (curr_style > dis_style_comment_start)
+	    curr_style = dis_style_text;
+
+	  /* Skip the hex character, and the closing STYLE_MARKER_CHAR.  */
+	  curr += 2;
+
+	  /* Reset the START to after the style marker.  */
+	  start = curr;
+	}
+      else
+	++curr;
+    }
+  while (true);
+
+  return res;
 }
 
 static int
@@ -9404,8 +9518,7 @@ print_insn (bfd_vma pc, instr_info *ins)
 
   if (ins->address_mode == mode_64bit && sizeof (bfd_vma) < 8)
     {
-      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
-					 _("64-bit address is disabled"));
+      i386_dis_printf (ins, dis_style_text, _("64-bit address is disabled"));
       return -1;
     }
 
@@ -9454,16 +9567,14 @@ print_insn (bfd_vma pc, instr_info *ins)
 	{
 	  name = prefix_name (ins, priv.the_buffer[0], priv.orig_sizeflag);
 	  if (name != NULL)
-	    (*ins->info->fprintf_styled_func)
-	      (ins->info->stream, dis_style_mnemonic, "%s", name);
+	    i386_dis_printf (ins, dis_style_mnemonic, "%s", name);
 	  else
 	    {
 	      /* Just print the first byte as a .byte instruction.  */
-	      (*ins->info->fprintf_styled_func)
-		(ins->info->stream, dis_style_assembler_directive, ".byte ");
-	      (*ins->info->fprintf_styled_func)
-		(ins->info->stream, dis_style_immediate, "0x%x",
-		 (unsigned int) priv.the_buffer[0]);
+	      i386_dis_printf (ins, dis_style_assembler_directive,
+			       ".byte ");
+	      i386_dis_printf (ins, dis_style_immediate, "0x%x",
+			       (unsigned int) priv.the_buffer[0]);
 	    }
 
 	  return 1;
@@ -9481,10 +9592,9 @@ print_insn (bfd_vma pc, instr_info *ins)
       for (i = 0;
 	   i < (int) ARRAY_SIZE (ins->all_prefixes) && ins->all_prefixes[i];
 	   i++)
-	(*ins->info->fprintf_styled_func)
-	  (ins->info->stream, dis_style_mnemonic, "%s%s",
-	   (i == 0 ? "" : " "), prefix_name (ins, ins->all_prefixes[i],
-					     sizeflag));
+	i386_dis_printf (ins, dis_style_mnemonic, "%s%s",
+			 (i == 0 ? "" : " "),
+			 prefix_name (ins, ins->all_prefixes[i], sizeflag));
       return i;
     }
 
@@ -9499,11 +9609,9 @@ print_insn (bfd_vma pc, instr_info *ins)
       /* Handle ins->prefixes before fwait.  */
       for (i = 0; i < ins->fwait_prefix && ins->all_prefixes[i];
 	   i++)
-	(*ins->info->fprintf_styled_func)
-	  (ins->info->stream, dis_style_mnemonic, "%s ",
-	   prefix_name (ins, ins->all_prefixes[i], sizeflag));
-      (*ins->info->fprintf_styled_func)
-	(ins->info->stream, dis_style_mnemonic, "fwait");
+	i386_dis_printf (ins, dis_style_mnemonic, "%s ",
+			 prefix_name (ins, ins->all_prefixes[i], sizeflag));
+      i386_dis_printf (ins, dis_style_mnemonic, "fwait");
       return i + 1;
     }
 
@@ -9572,10 +9680,10 @@ print_insn (bfd_vma pc, instr_info *ins)
 		  /* Don't print {%k0}.  */
 		  if (ins->vex.mask_register_specifier)
 		    {
+		      const char *reg_name
+			= att_names_mask[ins->vex.mask_register_specifier];
 		      oappend (ins, "{");
-		      oappend_maybe_intel (ins,
-					   att_names_mask
-					   [ins->vex.mask_register_specifier]);
+		      oappend_register (ins, reg_name);
 		      oappend (ins, "}");
 		    }
 		  if (ins->vex.zeroing)
@@ -9652,16 +9760,14 @@ print_insn (bfd_vma pc, instr_info *ins)
      are all 0s in inverted form.  */
   if (ins->need_vex && ins->vex.register_specifier != 0)
     {
-      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
-					 "(bad)");
+      i386_dis_printf (ins, dis_style_text, "(bad)");
       return ins->end_codep - priv.the_buffer;
     }
 
   /* If EVEX.z is set, there must be an actual mask register in use.  */
   if (ins->vex.zeroing && ins->vex.mask_register_specifier == 0)
     {
-      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
-					 "(bad)");
+      i386_dis_printf (ins, dis_style_text, "(bad)");
       return ins->end_codep - priv.the_buffer;
     }
 
@@ -9672,8 +9778,7 @@ print_insn (bfd_vma pc, instr_info *ins)
 	 the encoding invalid.  Most other PREFIX_OPCODE rules still apply.  */
       if (ins->need_vex ? !ins->vex.prefix : !(ins->prefixes & PREFIX_DATA))
 	{
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, "(bad)");
+	  i386_dis_printf (ins, dis_style_text, "(bad)");
 	  return ins->end_codep - priv.the_buffer;
 	}
       ins->used_prefixes |= PREFIX_DATA;
@@ -9700,8 +9805,7 @@ print_insn (bfd_vma pc, instr_info *ins)
 	  || (ins->vex.evex && dp->prefix_requirement != PREFIX_DATA
 	      && !ins->vex.w != !(ins->used_prefixes & PREFIX_DATA)))
 	{
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, "(bad)");
+	  i386_dis_printf (ins, dis_style_text, "(bad)");
 	  return ins->end_codep - priv.the_buffer;
 	}
       break;
@@ -9751,15 +9855,13 @@ print_insn (bfd_vma pc, instr_info *ins)
 	if (name == NULL)
 	  abort ();
 	prefix_length += strlen (name) + 1;
-	(*ins->info->fprintf_styled_func)
-	  (ins->info->stream, dis_style_mnemonic, "%s ", name);
+	i386_dis_printf (ins, dis_style_mnemonic, "%s ", name);
       }
 
   /* Check maximum code length.  */
   if ((ins->codep - ins->start_codep) > MAX_CODE_LENGTH)
     {
-      (*ins->info->fprintf_styled_func)
-	(ins->info->stream, dis_style_text, "(bad)");
+      i386_dis_printf (ins, dis_style_text, "(bad)");
       return MAX_CODE_LENGTH;
     }
 
@@ -9783,8 +9885,7 @@ print_insn (bfd_vma pc, instr_info *ins)
     i = 0;
 
   /* Print the instruction mnemonic along with any trailing whitespace.  */
-  (*ins->info->fprintf_styled_func)
-    (ins->info->stream, dis_style_mnemonic, "%s%*s", ins->obuf, i, "");
+  i386_dis_printf (ins, dis_style_mnemonic, "%s%*s", ins->obuf, i, "");
 
   /* The enter and bound instructions are printed with operands in the same
      order as the intel book; everything else is printed in reverse order.  */
@@ -9839,8 +9940,7 @@ print_insn (bfd_vma pc, instr_info *ins)
 	    break;
 	  }
 	if (needcomma)
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, ",");
+	  i386_dis_printf (ins, dis_style_text, ",");
 	if (ins->op_index[i] != -1 && !ins->op_riprel[i])
 	  {
 	    bfd_vma target = (bfd_vma) ins->op_address[ins->op_index[i]];
@@ -9856,18 +9956,14 @@ print_insn (bfd_vma pc, instr_info *ins)
 	    (*ins->info->print_address_func) (target, ins->info);
 	  }
 	else
-	  (*ins->info->fprintf_styled_func) (ins->info->stream,
-					     dis_style_text, "%s",
-					     op_txt[i]);
+	  i386_dis_printf (ins, dis_style_text, "%s", op_txt[i]);
 	needcomma = 1;
       }
 
   for (i = 0; i < MAX_OPERANDS; i++)
     if (ins->op_index[i] != -1 && ins->op_riprel[i])
       {
-	(*ins->info->fprintf_styled_func) (ins->info->stream,
-					   dis_style_comment_start,
-					   "        # ");
+	i386_dis_printf (ins, dis_style_comment_start, "        # ");
 	(*ins->info->print_address_func) ((bfd_vma)
 			(ins->start_pc + (ins->codep - ins->start_codep)
 			 + ins->op_address[ins->op_index[i]]), ins->info);
@@ -10252,7 +10348,7 @@ static void
 OP_ST (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
        int sizeflag ATTRIBUTE_UNUSED)
 {
-  oappend_maybe_intel (ins, "%st");
+  oappend_register (ins, "%st");
 }
 
 static void
@@ -10260,7 +10356,7 @@ OP_STi (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
 	int sizeflag ATTRIBUTE_UNUSED)
 {
   sprintf (ins->scratchbuf, "%%st(%d)", ins->modrm.rm);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_register (ins, ins->scratchbuf);
 }
 
 /* Capital letters in template are macros.  */
@@ -10807,12 +10903,73 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
   return 0;
 }
 
+/* Add a style marker to *INS->obufp that encodes STYLE.  This assumes that
+   the buffer pointed to by INS->obufp has space.  A style marker is made
+   from the STYLE_MARKER_CHAR followed by STYLE converted to a single hex
+   digit, followed by another STYLE_MARKER_CHAR.  This function assumes
+   that the number of styles is not greater than 16.  */
+
 static void
-oappend (instr_info *ins, const char *s)
+oappend_insert_style (instr_info *ins, enum disassembler_style style)
 {
+  unsigned num = (unsigned) style;
+
+  /* We currently assume that STYLE can be encoded as a single hex
+     character.  If more styles are added then this might start to fail,
+     and we'll need to expand this code.  */
+  if (num > 0xf)
+    abort ();
+
+  *ins->obufp++ = STYLE_MARKER_CHAR;
+  *ins->obufp++ = (num < 10 ? ('0' + num)
+		   : ((num < 16) ? ('a' + (num - 10)) : '0'));
+  *ins->obufp++ = STYLE_MARKER_CHAR;
+
+  /* This final null character is not strictly necessary, after inserting a
+     style marker we should always be inserting some additional content.
+     However, having the buffer null terminated doesn't cost much, and make
+     it easier to debug what's going on.  Also, if we do ever forget to add
+     any additional content after this style marker, then the buffer will
+     still be well formed.  */
+  *ins->obufp = '\0';
+}
+
+static void
+oappend_with_style (instr_info *ins, const char *s,
+		    enum disassembler_style style)
+{
+  oappend_insert_style (ins, style);
   ins->obufp = stpcpy (ins->obufp, s);
 }
 
+/* Like oappend_with_style but always with text style.  */
+
+static void
+oappend (instr_info *ins, const char *s)
+{
+  oappend_with_style (ins, s, dis_style_text);
+}
+
+/* Add a single character C to the buffer pointer to by INS->obufp, marking
+   the style for the character as STYLE.  */
+
+static void
+oappend_char_with_style (instr_info *ins, const char c,
+			 enum disassembler_style style)
+{
+  oappend_insert_style (ins, style);
+  *ins->obufp++ = c;
+  *ins->obufp = '\0';
+}
+
+/* Like oappend_char_with_style, but always uses dis_style_text.  */
+
+static void
+oappend_char (instr_info *ins, const char c)
+{
+  oappend_char_with_style (ins, c, dis_style_text);
+}
+
 static void
 append_seg (instr_info *ins)
 {
@@ -10824,26 +10981,27 @@ append_seg (instr_info *ins)
   switch (ins->active_seg_prefix)
     {
     case PREFIX_CS:
-      oappend_maybe_intel (ins, "%cs:");
+      oappend_register (ins, "%cs");
       break;
     case PREFIX_DS:
-      oappend_maybe_intel (ins, "%ds:");
+      oappend_register (ins, "%ds");
       break;
     case PREFIX_SS:
-      oappend_maybe_intel (ins, "%ss:");
+      oappend_register (ins, "%ss");
       break;
     case PREFIX_ES:
-      oappend_maybe_intel (ins, "%es:");
+      oappend_register (ins, "%es");
       break;
     case PREFIX_FS:
-      oappend_maybe_intel (ins, "%fs:");
+      oappend_register (ins, "%fs");
       break;
     case PREFIX_GS:
-      oappend_maybe_intel (ins, "%gs:");
+      oappend_register (ins, "%gs");
       break;
     default:
       break;
     }
+  oappend_char (ins, ':');
 }
 
 static void
@@ -11331,7 +11489,7 @@ print_register (instr_info *ins, unsigned int reg, unsigned int rexmask,
       oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
       return;
     }
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_register (ins, names[reg]);
 }
 
 static void
@@ -11595,11 +11753,15 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	      print_displacement (ins, ins->scratchbuf, disp);
 	    else
 	      print_operand_value (ins, ins->scratchbuf, 1, disp);
-	    oappend (ins, ins->scratchbuf);
+	    oappend_with_style (ins, ins->scratchbuf,
+				dis_style_address_offset);
 	    if (riprel)
 	      {
 		set_op (ins, disp, true);
-		oappend (ins, !addr32flag ? "(%rip)" : "(%eip)");
+		oappend_char (ins, '(');
+		oappend_with_style (ins, !addr32flag ? "%rip" : "%eip",
+				    dis_style_register);
+		oappend_char (ins, ')');
 	      }
 	  }
 
@@ -11613,17 +11775,18 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
       if (havedisp || (ins->intel_syntax && riprel))
 	{
-	  *ins->obufp++ = ins->open_char;
+	  oappend_char (ins, ins->open_char);
 	  if (ins->intel_syntax && riprel)
 	    {
 	      set_op (ins, disp, true);
-	      oappend (ins, !addr32flag ? "rip" : "eip");
+	      oappend_with_style (ins, !addr32flag ? "rip" : "eip",
+				  dis_style_register);
 	    }
-	  *ins->obufp = '\0';
 	  if (havebase)
-	    oappend_maybe_intel (ins,
-				 (ins->address_mode == mode_64bit && !addr32flag
-				  ? att_names64 : att_names32)[rbase]);
+	    oappend_register
+	      (ins,
+	       (ins->address_mode == mode_64bit && !addr32flag
+		? att_names64 : att_names32)[rbase]);
 	  if (ins->has_sib)
 	    {
 	      /* ESP/RSP won't allow index.  If base isn't ESP/RSP,
@@ -11634,41 +11797,35 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 		  || (havebase && base != ESP_REG_NUM))
 		{
 		  if (!ins->intel_syntax || havebase)
-		    {
-		      *ins->obufp++ = ins->separator_char;
-		      *ins->obufp = '\0';
-		    }
+		    oappend_char (ins, ins->separator_char);
 		  if (indexes)
 		    {
 		      if (ins->address_mode == mode_64bit || vindex < 16)
-			oappend_maybe_intel (ins, indexes[vindex]);
+			oappend_register (ins, indexes[vindex]);
 		      else
 			oappend (ins, "(bad)");
 		    }
 		  else
-		    oappend_maybe_intel (ins,
-					 ins->address_mode == mode_64bit
-					 && !addr32flag ? att_index64
-							: att_index32);
+		    oappend_register (ins,
+				      ins->address_mode == mode_64bit
+				      && !addr32flag
+				      ? att_index64
+				      : att_index32);
 
-		  *ins->obufp++ = ins->scale_char;
-		  *ins->obufp = '\0';
+		  oappend_char (ins, ins->scale_char);
 		  sprintf (ins->scratchbuf, "%d", 1 << scale);
-		  oappend (ins, ins->scratchbuf);
+		  oappend_with_style (ins, ins->scratchbuf,
+				      dis_style_immediate);
 		}
 	    }
 	  if (ins->intel_syntax
 	      && (disp || ins->modrm.mod != 0 || base == 5))
 	    {
 	      if (!havedisp || (bfd_signed_vma) disp >= 0)
-		{
-		  *ins->obufp++ = '+';
-		  *ins->obufp = '\0';
-		}
+		  oappend_char (ins, '+');
 	      else if (ins->modrm.mod != 1 && disp != -disp)
 		{
-		  *ins->obufp++ = '-';
-		  *ins->obufp = '\0';
+		  oappend_char (ins, '-');
 		  disp = -disp;
 		}
 
@@ -11679,8 +11836,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	      oappend (ins, ins->scratchbuf);
 	    }
 
-	  *ins->obufp++ = ins->close_char;
-	  *ins->obufp = '\0';
+	  oappend_char (ins, ins->close_char);
 
 	  if (check_gather)
 	    {
@@ -11701,7 +11857,7 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	    {
 	      if (!ins->active_seg_prefix)
 		{
-		  oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
+		  oappend_register (ins, att_names_seg[ds_reg - es_reg]);
 		  oappend (ins, ":");
 		}
 	      print_operand_value (ins, ins->scratchbuf, 1, disp);
@@ -11757,23 +11913,17 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 
       if (ins->modrm.mod != 0 || ins->modrm.rm != 6)
 	{
-	  *ins->obufp++ = ins->open_char;
-	  *ins->obufp = '\0';
-	  oappend (ins,
-		   (ins->intel_syntax ? intel_index16
-				      : att_index16)[ins->modrm.rm]);
+	  oappend_char (ins, ins->open_char);
+	  oappend (ins, (ins->intel_syntax ? intel_index16
+			 : att_index16)[ins->modrm.rm]);
 	  if (ins->intel_syntax
 	      && (disp || ins->modrm.mod != 0 || ins->modrm.rm == 6))
 	    {
 	      if ((bfd_signed_vma) disp >= 0)
-		{
-		  *ins->obufp++ = '+';
-		  *ins->obufp = '\0';
-		}
+		oappend_char (ins, '+');
 	      else if (ins->modrm.mod != 1)
 		{
-		  *ins->obufp++ = '-';
-		  *ins->obufp = '\0';
+		  oappend_char (ins, '-');
 		  disp = -disp;
 		}
 
@@ -11781,14 +11931,13 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
 	      oappend (ins, ins->scratchbuf);
 	    }
 
-	  *ins->obufp++ = ins->close_char;
-	  *ins->obufp = '\0';
+	  oappend_char (ins, ins->close_char);
 	}
       else if (ins->intel_syntax)
 	{
 	  if (!ins->active_seg_prefix)
 	    {
-	      oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
+	      oappend_register (ins, att_names_seg[ds_reg - es_reg]);
 	      oappend (ins, ":");
 	    }
 	  print_operand_value (ins, ins->scratchbuf, 1, disp & 0xffff);
@@ -11999,7 +12148,7 @@ OP_REG (instr_info *ins, int code, int sizeflag)
     {
     case es_reg: case ss_reg: case cs_reg:
     case ds_reg: case fs_reg: case gs_reg:
-      oappend_maybe_intel (ins, att_names_seg[code - es_reg]);
+      oappend_register (ins, att_names_seg[code - es_reg]);
       return;
     }
 
@@ -12052,7 +12201,7 @@ OP_REG (instr_info *ins, int code, int sizeflag)
       oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
       return;
     }
-  oappend_maybe_intel (ins, s);
+  oappend_register (ins, s);
 }
 
 static void
@@ -12093,7 +12242,7 @@ OP_IMREG (instr_info *ins, int code, int sizeflag)
       oappend (ins, INTERNAL_DISASSEMBLER_ERROR);
       return;
     }
-  oappend_maybe_intel (ins, s);
+  oappend_register (ins, s);
 }
 
 static void
@@ -12148,7 +12297,7 @@ OP_I (instr_info *ins, int bytemode, int sizeflag)
   op &= mask;
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, op);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_immediate (ins, ins->scratchbuf);
   ins->scratchbuf[0] = '\0';
 }
 
@@ -12166,7 +12315,7 @@ OP_I64 (instr_info *ins, int bytemode, int sizeflag)
 
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, get64 (ins));
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_immediate (ins, ins->scratchbuf);
   ins->scratchbuf[0] = '\0';
 }
 
@@ -12220,7 +12369,7 @@ OP_sI (instr_info *ins, int bytemode, int sizeflag)
 
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, op);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_immediate (ins, ins->scratchbuf);
 }
 
 static void
@@ -12278,7 +12427,7 @@ static void
 OP_SEG (instr_info *ins, int bytemode, int sizeflag)
 {
   if (bytemode == w_mode)
-    oappend_maybe_intel (ins, att_names_seg[ins->modrm.reg]);
+    oappend_register (ins, att_names_seg[ins->modrm.reg]);
   else
     OP_E (ins, ins->modrm.mod == 3 ? bytemode : w_mode, sizeflag);
 }
@@ -12324,12 +12473,12 @@ OP_OFF (instr_info *ins, int bytemode, int sizeflag)
     {
       if (!ins->active_seg_prefix)
 	{
-	  oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
+	  oappend_register (ins, att_names_seg[ds_reg - es_reg]);
 	  oappend (ins, ":");
 	}
     }
   print_operand_value (ins, ins->scratchbuf, 1, off);
-  oappend (ins, ins->scratchbuf);
+  oappend_with_style (ins, ins->scratchbuf, dis_style_address_offset);
 }
 
 static void
@@ -12354,12 +12503,12 @@ OP_OFF64 (instr_info *ins, int bytemode, int sizeflag)
     {
       if (!ins->active_seg_prefix)
 	{
-	  oappend_maybe_intel (ins, att_names_seg[ds_reg - es_reg]);
+	  oappend_register (ins, att_names_seg[ds_reg - es_reg]);
 	  oappend (ins, ":");
 	}
     }
   print_operand_value (ins, ins->scratchbuf, 1, off);
-  oappend (ins, ins->scratchbuf);
+  oappend_with_style (ins, ins->scratchbuf, dis_style_address_offset);
 }
 
 static void
@@ -12380,9 +12529,8 @@ ptr_reg (instr_info *ins, int code, int sizeflag)
     s = att_names32[code - eAX_reg];
   else
     s = att_names16[code - eAX_reg];
-  oappend_maybe_intel (ins, s);
-  *ins->obufp++ = ins->close_char;
-  *ins->obufp = 0;
+  oappend_register (ins, s);
+  oappend_char (ins, ins->close_char);
 }
 
 static void
@@ -12405,7 +12553,8 @@ OP_ESreg (instr_info *ins, int code, int sizeflag)
 	  intel_operand_size (ins, b_mode, sizeflag);
 	}
     }
-  oappend_maybe_intel (ins, "%es:");
+  oappend_register (ins, "%es");
+  oappend_char (ins, ':');
   ptr_reg (ins, code, sizeflag);
 }
 
@@ -12455,7 +12604,7 @@ OP_C (instr_info *ins, int dummy ATTRIBUTE_UNUSED,
   else
     add = 0;
   sprintf (ins->scratchbuf, "%%cr%d", ins->modrm.reg + add);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_register (ins, ins->scratchbuf);
 }
 
 static void
@@ -12480,7 +12629,7 @@ OP_T (instr_info *ins, int dummy ATTRIBUTE_UNUSED,
       int sizeflag ATTRIBUTE_UNUSED)
 {
   sprintf (ins->scratchbuf, "%%tr%d", ins->modrm.reg);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_register (ins, ins->scratchbuf);
 }
 
 static void
@@ -12500,7 +12649,7 @@ OP_MMX (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
     }
   else
     names = att_names_mm;
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_register (ins, names[reg]);
 }
 
 static void
@@ -12575,7 +12724,7 @@ print_vector_reg (instr_info *ins, unsigned int reg, int bytemode)
     }
   else
     names = att_names_xmm;
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_register (ins, names[reg]);
 }
 
 static void
@@ -12635,7 +12784,7 @@ OP_EM (instr_info *ins, int bytemode, int sizeflag)
     }
   else
     names = att_names_mm;
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_register (ins, names[reg]);
 }
 
 /* cvt* are the only instructions in sse2 which have
@@ -12661,7 +12810,7 @@ OP_EMC (instr_info *ins, int bytemode, int sizeflag)
   MODRM_CHECK;
   ins->codep++;
   ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
-  oappend_maybe_intel (ins, att_names_mm[ins->modrm.rm]);
+  oappend_register (ins, att_names_mm[ins->modrm.rm]);
 }
 
 static void
@@ -12669,7 +12818,7 @@ OP_MXC (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
 	int sizeflag ATTRIBUTE_UNUSED)
 {
   ins->used_prefixes |= (ins->prefixes & PREFIX_DATA);
-  oappend_maybe_intel (ins, att_names_mm[ins->modrm.reg]);
+  oappend_register (ins, att_names_mm[ins->modrm.reg]);
 }
 
 static void
@@ -12845,7 +12994,7 @@ OP_3DNowSuffix (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
   ins->obufp = ins->mnemonicendp;
   mnemonic = Suffix3DNow[*ins->codep++ & 0xff];
   if (mnemonic)
-    oappend (ins, mnemonic);
+    ins->obufp = stpcpy (ins->obufp, mnemonic);
   else
     {
       /* Since a variable sized ins->modrm/ins->sib chunk is between the start
@@ -12934,7 +13083,7 @@ CMP_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
       /* We have a reserved extension byte.  Output it directly.  */
       ins->scratchbuf[0] = '$';
       print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
-      oappend_maybe_intel (ins, ins->scratchbuf);
+      oappend_immediate (ins, ins->scratchbuf);
       ins->scratchbuf[0] = '\0';
     }
 }
@@ -12991,7 +13140,7 @@ BadOp (instr_info *ins)
 {
   /* Throw away prefixes and 1st. opcode byte.  */
   ins->codep = ins->insn_codep + 1;
-  oappend (ins, "(bad)");
+  ins->obufp = stpcpy (ins->obufp, "(bad)");
 }
 
 static void
@@ -13155,7 +13304,7 @@ XMM_Fixup (instr_info *ins, int reg, int sizeflag ATTRIBUTE_UNUSED)
 	  abort ();
 	}
     }
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_register (ins, names[reg]);
 }
 
 static void
@@ -13204,7 +13353,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
   switch (bytemode)
     {
     case scalar_mode:
-      oappend_maybe_intel (ins, att_names_xmm[reg]);
+      oappend_register (ins, att_names_xmm[reg]);
       return;
 
     case vex_vsib_d_w_dq_mode:
@@ -13215,9 +13364,9 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       if (ins->vex.length == 128
 	  || (bytemode != vex_vsib_d_w_dq_mode
 	      && !ins->vex.w))
-	oappend_maybe_intel (ins, att_names_xmm[reg]);
+	oappend_register (ins, att_names_xmm[reg]);
       else
-	oappend_maybe_intel (ins, att_names_ymm[reg]);
+	oappend_register (ins, att_names_ymm[reg]);
 
       /* All 3 XMM/YMM registers must be distinct.  */
       modrm_reg = ins->modrm.reg;
@@ -13249,7 +13398,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
 	  /* This must be the 3rd operand.  */
 	  if (ins->obufp != ins->op_out[2])
 	    abort ();
-	  oappend_maybe_intel (ins, att_names_tmm[reg]);
+	  oappend_register (ins, att_names_tmm[reg]);
 	  if (reg == ins->modrm.reg || reg == ins->modrm.rm)
 	    strcpy (ins->obufp, "/(bad)");
 	}
@@ -13327,7 +13476,7 @@ OP_VEX (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
       abort ();
       break;
     }
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_register (ins, names[reg]);
 }
 
 static void
@@ -13370,7 +13519,7 @@ OP_REG_VexI4 (instr_info *ins, int bytemode, int sizeflag ATTRIBUTE_UNUSED)
   if (bytemode == x_mode && ins->vex.length == 256)
     names = att_names_ymm;
 
-  oappend_maybe_intel (ins, names[reg]);
+  oappend_register (ins, names[reg]);
 
   if (ins->vex.w)
     {
@@ -13387,7 +13536,7 @@ OP_VexI4 (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
 {
   ins->scratchbuf[0] = '$';
   print_operand_value (ins, ins->scratchbuf + 1, 1, ins->codep[-1] & 0xf);
-  oappend_maybe_intel (ins, ins->scratchbuf);
+  oappend_immediate (ins, ins->scratchbuf);
 }
 
 static void
@@ -13432,7 +13581,7 @@ VPCMP_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
       /* We have a reserved extension byte.  Output it directly.  */
       ins->scratchbuf[0] = '$';
       print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
-      oappend_maybe_intel (ins, ins->scratchbuf);
+      oappend_immediate (ins, ins->scratchbuf);
       ins->scratchbuf[0] = '\0';
     }
 }
@@ -13484,7 +13633,7 @@ VPCOM_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
       /* We have a reserved extension byte.  Output it directly.  */
       ins->scratchbuf[0] = '$';
       print_operand_value (ins, ins->scratchbuf + 1, 1, cmp_type);
-      oappend_maybe_intel (ins, ins->scratchbuf);
+      oappend_immediate (ins, ins->scratchbuf);
       ins->scratchbuf[0] = '\0';
     }
 }
@@ -13532,7 +13681,7 @@ PCLMUL_Fixup (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
       /* We have a reserved extension byte.  Output it directly.  */
       ins->scratchbuf[0] = '$';
       print_operand_value (ins, ins->scratchbuf + 1, 1, pclmul_type);
-      oappend_maybe_intel (ins, ins->scratchbuf);
+      oappend_immediate (ins, ins->scratchbuf);
       ins->scratchbuf[0] = '\0';
     }
 }


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3] libopcodes: extend the styling within the i386 disassembler
  2022-05-31 17:20                 ` Andrew Burgess
@ 2022-06-01  5:59                   ` Jan Beulich
  2022-06-01 15:56                     ` H.J. Lu
  0 siblings, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2022-06-01  5:59 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: binutils, H.J. Lu

On 31.05.2022 19:20, Andrew Burgess wrote:
> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>> On 27.05.2022 19:44, Andrew Burgess via Binutils wrote:
>>> @@ -9299,11 +9304,117 @@ get_sib (instr_info *ins, int sizeflag)
>>>  }
>>>  
>>>  /* Like oappend (below), but S is a string starting with '%'.
>>> -   In Intel syntax, the '%' is elided.  */
>>> +   In Intel syntax, the '%' is elided.  STYLE is used when displaying this
>>> +   part of the output in the disassembler.
>>
>> As you're touching this comment anyway, can you add reference to
>> '$'?
> 
> Done.
> 
>>      Or alternatively (that's what I was envisioning with the
>> comment on v2) drop this function altogether, doing what it does
>> separately in oappend_register() and oappend_immediate()?
> 
> I didn't do this (though I will if you insist), I'd just prefer to keep
> the "magic" for how we handle the intel syntax (character skipping) in a
> single place.

I won't insist; I may do this subsequently though in a follow-on
patch.

>>> @@ -9404,8 +9515,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>>>  
>>>    if (ins->address_mode == mode_64bit && sizeof (bfd_vma) < 8)
>>>      {
>>> -      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
>>> -					 _("64-bit address is disabled"));
>>> +      i386_dis_printf (ins, dis_style_text, _("64-bit address is disabled"));
>>
>> Just wondering: Couldn't there be an "error" style?
> 
> I've avoided an error style because I don't think the disassembler
> _should_ be emitting errors like this.
> 
> I'll go so far as to say that I consider this case a bug in the i386
> disassembler.
> 
> IMHO, if we pass some content to the disassembler then it should
> disassemble it to something, that might just be .word or .byte
> directives rather than real instructions, but we should disassemble to
> something.
> 
> In the above, isn't the "error" really just a reflection that the
> disassembler has been written using bfd_vma in places where either
> uint64_t or int64_t would have been a better choice?
> 
> If we did decide that the assembler should be able to handle errors
> other than memory errors, I think the correct solution would be to
> either add (yet) another callback which is like the memory error
> callback, but for different errors.  Or, modify the existing error
> callback to handle different types of error maybe....
> 
> ... anyway, I don't think we should do that, but I don't think we should
> add an error style either as I feel it will just encourage bad behaviour
> when writing the disassemblers.

That's certainly a fair view to have, albeit I'm not sure I fully
share it. In some cases I consider it more helpful for the
disassembler to at least provide a hint at what's wrong in a
given encoding.

> Patch below includes the updates you asked for above.

Thanks, lgtm. It'll want to be H.J. though to approve of this going
in.

Jan


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3] libopcodes: extend the styling within the i386 disassembler
  2022-06-01  5:59                   ` Jan Beulich
@ 2022-06-01 15:56                     ` H.J. Lu
  2022-06-08 16:03                       ` Andrew Burgess
  0 siblings, 1 reply; 29+ messages in thread
From: H.J. Lu @ 2022-06-01 15:56 UTC (permalink / raw)
  To: Jan Beulich; +Cc: Andrew Burgess, Binutils

On Tue, May 31, 2022 at 10:59 PM Jan Beulich <jbeulich@suse.com> wrote:
>
> On 31.05.2022 19:20, Andrew Burgess wrote:
> > Jan Beulich via Binutils <binutils@sourceware.org> writes:
> >> On 27.05.2022 19:44, Andrew Burgess via Binutils wrote:
> >>> @@ -9299,11 +9304,117 @@ get_sib (instr_info *ins, int sizeflag)
> >>>  }
> >>>
> >>>  /* Like oappend (below), but S is a string starting with '%'.
> >>> -   In Intel syntax, the '%' is elided.  */
> >>> +   In Intel syntax, the '%' is elided.  STYLE is used when displaying this
> >>> +   part of the output in the disassembler.
> >>
> >> As you're touching this comment anyway, can you add reference to
> >> '$'?
> >
> > Done.
> >
> >>      Or alternatively (that's what I was envisioning with the
> >> comment on v2) drop this function altogether, doing what it does
> >> separately in oappend_register() and oappend_immediate()?
> >
> > I didn't do this (though I will if you insist), I'd just prefer to keep
> > the "magic" for how we handle the intel syntax (character skipping) in a
> > single place.
>
> I won't insist; I may do this subsequently though in a follow-on
> patch.
>
> >>> @@ -9404,8 +9515,7 @@ print_insn (bfd_vma pc, instr_info *ins)
> >>>
> >>>    if (ins->address_mode == mode_64bit && sizeof (bfd_vma) < 8)
> >>>      {
> >>> -      (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
> >>> -                                    _("64-bit address is disabled"));
> >>> +      i386_dis_printf (ins, dis_style_text, _("64-bit address is disabled"));
> >>
> >> Just wondering: Couldn't there be an "error" style?
> >
> > I've avoided an error style because I don't think the disassembler
> > _should_ be emitting errors like this.
> >
> > I'll go so far as to say that I consider this case a bug in the i386
> > disassembler.
> >
> > IMHO, if we pass some content to the disassembler then it should
> > disassemble it to something, that might just be .word or .byte
> > directives rather than real instructions, but we should disassemble to
> > something.
> >
> > In the above, isn't the "error" really just a reflection that the
> > disassembler has been written using bfd_vma in places where either
> > uint64_t or int64_t would have been a better choice?
> >
> > If we did decide that the assembler should be able to handle errors
> > other than memory errors, I think the correct solution would be to
> > either add (yet) another callback which is like the memory error
> > callback, but for different errors.  Or, modify the existing error
> > callback to handle different types of error maybe....
> >
> > ... anyway, I don't think we should do that, but I don't think we should
> > add an error style either as I feel it will just encourage bad behaviour
> > when writing the disassemblers.
>
> That's certainly a fair view to have, albeit I'm not sure I fully
> share it. In some cases I consider it more helpful for the
> disassembler to at least provide a hint at what's wrong in a
> given encoding.
>
> > Patch below includes the updates you asked for above.
>
> Thanks, lgtm. It'll want to be H.J. though to approve of this going
> in.

OK.

Thanks.

-- 
H.J.

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3] libopcodes: extend the styling within the i386 disassembler
  2022-06-01 15:56                     ` H.J. Lu
@ 2022-06-08 16:03                       ` Andrew Burgess
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Burgess @ 2022-06-08 16:03 UTC (permalink / raw)
  To: H.J. Lu, Jan Beulich; +Cc: Binutils


Thank you both for your reviews.

I've now pushed this patch.

Andrew


^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3] libopcodes: extend the styling within the i386 disassembler
  2022-05-27 17:44             ` [PATCHv3] " Andrew Burgess
  2022-05-30  8:19               ` Jan Beulich
@ 2022-06-10 10:56               ` Jan Beulich
  2022-06-10 13:01                 ` Andrew Burgess
  1 sibling, 1 reply; 29+ messages in thread
From: Jan Beulich @ 2022-06-10 10:56 UTC (permalink / raw)
  To: Andrew Burgess; +Cc: binutils

On 27.05.2022 19:44, Andrew Burgess via Binutils wrote:
> @@ -11595,11 +11750,15 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>  	      print_displacement (ins, ins->scratchbuf, disp);
>  	    else
>  	      print_operand_value (ins, ins->scratchbuf, 1, disp);
> -	    oappend (ins, ins->scratchbuf);
> +	    oappend_with_style (ins, ins->scratchbuf,
> +				dis_style_address_offset);

Is there a reason you changed this to dis_style_address_offset, but
not the other cases where print_displacement() is used (always for
similar purposes)? I'm asking because I'm going to touch all these
instances, so if dis_style_address_offset was always meant to be
used there, I'd switch that around kind of as a side effect.

Jan

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [PATCHv3] libopcodes: extend the styling within the i386 disassembler
  2022-06-10 10:56               ` Jan Beulich
@ 2022-06-10 13:01                 ` Andrew Burgess
  0 siblings, 0 replies; 29+ messages in thread
From: Andrew Burgess @ 2022-06-10 13:01 UTC (permalink / raw)
  To: Jan Beulich; +Cc: binutils

Jan Beulich via Binutils <binutils@sourceware.org> writes:

> On 27.05.2022 19:44, Andrew Burgess via Binutils wrote:
>> @@ -11595,11 +11750,15 @@ OP_E_memory (instr_info *ins, int bytemode, int sizeflag)
>>  	      print_displacement (ins, ins->scratchbuf, disp);
>>  	    else
>>  	      print_operand_value (ins, ins->scratchbuf, 1, disp);
>> -	    oappend (ins, ins->scratchbuf);
>> +	    oappend_with_style (ins, ins->scratchbuf,
>> +				dis_style_address_offset);
>
> Is there a reason you changed this to dis_style_address_offset, but
> not the other cases where print_displacement() is used (always for
> similar purposes)? I'm asking because I'm going to touch all these
> instances, so if dis_style_address_offset was always meant to be
> used there, I'd switch that around kind of as a side effect.

I think they should all be dis_style_address_offset.  Sorry for missing
these.

Thanks,
Andrew


^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2022-06-10 13:01 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2022-04-29 13:42 [PATCH 0/2] Disassembler styling for i386-dis.c Andrew Burgess
2022-04-29 13:42 ` [PATCH 1/2] objdump: fix styled printing of addresses Andrew Burgess
2022-05-02  7:14   ` Jan Beulich
2022-05-03  9:52     ` Andrew Burgess
2022-04-29 13:42 ` [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler Andrew Burgess
2022-04-29 18:16   ` Vladimir Mezentsev
2022-05-03 13:15     ` Andrew Burgess
2022-04-29 18:57   ` H.J. Lu
2022-05-03 13:14     ` Andrew Burgess
2022-05-02  7:28   ` Jan Beulich
2022-05-03 13:12     ` Andrew Burgess
2022-05-03 15:47       ` H.J. Lu
2022-05-04  7:58       ` Jan Beulich
2022-05-09  9:48         ` Andrew Burgess
2022-05-09 12:54           ` [PATCHv2] " Andrew Burgess
2022-05-18 12:27             ` Jan Beulich
2022-05-26 12:48               ` Andrew Burgess
2022-05-18 21:23             ` H.J. Lu
2022-05-27 17:44             ` [PATCHv3] " Andrew Burgess
2022-05-30  8:19               ` Jan Beulich
2022-05-31 17:20                 ` Andrew Burgess
2022-06-01  5:59                   ` Jan Beulich
2022-06-01 15:56                     ` H.J. Lu
2022-06-08 16:03                       ` Andrew Burgess
2022-06-10 10:56               ` Jan Beulich
2022-06-10 13:01                 ` Andrew Burgess
2022-05-18  7:06           ` [PATCH 2/2] " Jan Beulich
2022-05-18 10:41             ` Andrew Burgess
2022-05-18 10:46               ` Jan Beulich

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).