public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: Andrew Burgess <aburgess@redhat.com>
To: Vladimir Mezentsev <vladimir.mezentsev@oracle.com>,
	binutils@sourceware.org
Subject: Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
Date: Tue, 03 May 2022 14:15:36 +0100	[thread overview]
Message-ID: <87tua6n4ef.fsf@redhat.com> (raw)
In-Reply-To: <4e7afdd1-6721-2b47-6eab-53f8f0a0dd46@oracle.com>

Vladimir Mezentsev via Binutils <binutils@sourceware.org> writes:

> On 4/29/22 06:42, Andrew Burgess via Binutils wrote:
>> The i386 disassembler is pretty complex.  Most disassembly is done
>> indirectly; operands are built into buffers within a struct instr_info
>> instance, before finally being printed later in the disassembly
>> process.
>>
>> Sometimes the operand buffers are built in a different order to the
>> order in which they will eventually be printed.
>>
>> Each operand can contain multiple components, e.g. multiple registers,
>> immediates, other textual elements (commas, brackets, etc).
>>
>> When looking for how to apply styling I guess the ideal solution would
>> be to move away from the operands being a single string that is built
>> up, and instead have each operand be a list of "parts", where each
>> part is some text and a style.  Then, when we eventually print the
>> operand we would loop over the parts and print each part with the
>> correct style.
>>
>> But it feels like a huge amount of work to move from where we are
>> now to that potentially ideal solution.  Plus, the above solution
>> would be pretty complex.
>>
>> So, instead I propose a .... different solution here, one that works
>> with the existing infrastructure.
>>
>> As each operand is built up, piece be piece, we pass through style
>> information.  This style information is then encoded into the operand
>> buffer (see below for details).  After this the code can continue to
>> operate as it does right now in order to manage the set of operand
>> buffers.
>>
>> Then, as each operand is printed we can split the operand buffer into
>> chunks at the style marker boundaries, with each chunk being printed
>> in the correct style.
>>
>> For encoding the style information I use the format "~%x~".  As far as
>> I can tell the '~' is not otherwise used in the i386 disassembler, so
>> this should serve as a unique marker.  To speed up writing and then
>> reading the style markers, I take advantage of the fact that there are
>> less than 16 styles so I know the '%x' will only ever be a single hex
>> character.
>>
>> In some (not very scientific) benchmarking on my machine,
>> disassembling a reasonably large (142M) shared library, I'm not seeing
>> any significant slow down in disassembler speed with this change.
>>
>> Most instructions are now being fully syntax highlighted when I
>> disassemble using the --disassembler-color=extended-color option.  I'm
>> sure that there are probably still a few corner cases that need fixing
>> up, but we can come back to them later I think.
>>
>> When disassembler syntax highlighting is not being used, then there
>> should be no user visible changes after this commit.
>> ---
>>   opcodes/i386-dis.c | 571 ++++++++++++++++++++++++++-------------------
>>   1 file changed, 332 insertions(+), 239 deletions(-)
>>
>> diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
>> index 1e3266329c1..c94d316a03f 100644
>> --- a/opcodes/i386-dis.c
>> +++ b/opcodes/i386-dis.c
>> @@ -42,12 +42,14 @@
>>   #include <setjmp.h>
>>   typedef struct instr_info instr_info;
>>   
>> +#define STYLE_BUFFER_SIZE 10
>> +
>>   static int print_insn (bfd_vma, instr_info *);
>>   static void dofloat (instr_info *, int);
>>   static void OP_ST (instr_info *, int, int);
>>   static void OP_STi (instr_info *, int, int);
>>   static int putop (instr_info *, const char *, int);
>> -static void oappend (instr_info *, const char *);
>> +static void oappend (instr_info *, const char *, enum disassembler_style);
>>   static void append_seg (instr_info *);
>>   static void OP_indirE (instr_info *, int, int);
>>   static void print_operand_value (instr_info *, char *, int, bfd_vma);
>> @@ -166,6 +168,8 @@ struct instr_info
>>     char *obufp;
>>     char *mnemonicendp;
>>     char scratchbuf[100];
>> +  char style_buffer[STYLE_BUFFER_SIZE];
>
> I don't see where  style_buffer is used.
> It looks like style_buffer and  STYLE_BUFFER_SIZE are not needed.
>
>> +  char staging_area[100];
>
>   staging_area is used only in i386_dis_printf().
> Why this is not a local array inside i386_dis_printf() ?
>
>
>>     unsigned char *start_codep;
>>     unsigned char *insn_codep;
>>     unsigned char *codep;
>> @@ -248,6 +252,8 @@ struct instr_info
>>   
>>     enum x86_64_isa isa64;
>>   
>> +  int (*printf) (instr_info *ins, enum disassembler_style style,
>> +		 const char *fmt, ...) ATTRIBUTE_FPTR_PRINTF_3;
>>   };
>>   
>>   /* Mark parts used in the REX prefix.  When we are testing for
>> @@ -9300,9 +9306,73 @@ get_sib (instr_info *ins, int sizeflag)
>>   /* Like oappend (below), but S is a string starting with '%'.
>>      In Intel syntax, the '%' is elided.  */
>>   static void
>> -oappend_maybe_intel (instr_info *ins, const char *s)
>> +oappend_maybe_intel (instr_info *ins, const char *s,
>> +		     enum disassembler_style style)
>>   {
>> -  oappend (ins, s + ins->intel_syntax);
>> +  oappend (ins, s + ins->intel_syntax, style);
>> +}
>> +
>> +/* Wrap around a call to INS->info->fprintf_styled_func, printing FMT.
>> +   STYLE is the default style to use in the fprintf_styled_func calls,
>> +   however, FMT might include embedded style markers (see oappend_style),
>> +   these embedded markers are not printed, but instead change the style
>> +   used in the next fprintf_styled_func call.
>> +
>> +   Return non-zero to indicate the print call was a success.  */
>> +
>> +static int ATTRIBUTE_PRINTF_3
>> +i386_dis_printf (instr_info *ins, enum disassembler_style style,
>> +		 const char *fmt, ...)
>> +{
>> +  va_list ap;
>> +  enum disassembler_style curr_style = style;
>> +  char *start, *curr;
>> +
>> +  va_start (ap, fmt);
>> +  vsnprintf (ins->staging_area, 100, fmt, ap);
>
> Maybe sizeof (ins->staging_area) instead of 100 is better.
>
> As I wrote above,  staging_area  can be declared inside i386_dis_printf.

Vladimir,

Thanks, I've addressed all these issues in my local branch.  Once I've
resolved the use of '~' that H.J. and Jan have asked about I'll post an
updated version.

Thanks,
Andrew


  reply	other threads:[~2022-05-03 15:21 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-29 13:42 [PATCH 0/2] Disassembler styling for i386-dis.c Andrew Burgess
2022-04-29 13:42 ` [PATCH 1/2] objdump: fix styled printing of addresses Andrew Burgess
2022-05-02  7:14   ` Jan Beulich
2022-05-03  9:52     ` Andrew Burgess
2022-04-29 13:42 ` [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler Andrew Burgess
2022-04-29 18:16   ` Vladimir Mezentsev
2022-05-03 13:15     ` Andrew Burgess [this message]
2022-04-29 18:57   ` H.J. Lu
2022-05-03 13:14     ` Andrew Burgess
2022-05-02  7:28   ` Jan Beulich
2022-05-03 13:12     ` Andrew Burgess
2022-05-03 15:47       ` H.J. Lu
2022-05-04  7:58       ` Jan Beulich
2022-05-09  9:48         ` Andrew Burgess
2022-05-09 12:54           ` [PATCHv2] " Andrew Burgess
2022-05-18 12:27             ` Jan Beulich
2022-05-26 12:48               ` Andrew Burgess
2022-05-18 21:23             ` H.J. Lu
2022-05-27 17:44             ` [PATCHv3] " Andrew Burgess
2022-05-30  8:19               ` Jan Beulich
2022-05-31 17:20                 ` Andrew Burgess
2022-06-01  5:59                   ` Jan Beulich
2022-06-01 15:56                     ` H.J. Lu
2022-06-08 16:03                       ` Andrew Burgess
2022-06-10 10:56               ` Jan Beulich
2022-06-10 13:01                 ` Andrew Burgess
2022-05-18  7:06           ` [PATCH 2/2] " Jan Beulich
2022-05-18 10:41             ` Andrew Burgess
2022-05-18 10:46               ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=87tua6n4ef.fsf@redhat.com \
    --to=aburgess@redhat.com \
    --cc=binutils@sourceware.org \
    --cc=vladimir.mezentsev@oracle.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).