From: Andrew Burgess <aburgess@redhat.com>
To: "H.J. Lu" <hjl.tools@gmail.com>
Cc: Binutils <binutils@sourceware.org>
Subject: Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
Date: Tue, 03 May 2022 14:14:43 +0100 [thread overview]
Message-ID: <87wnf2n4fw.fsf@redhat.com> (raw)
In-Reply-To: <CAMe9rOoNT_7wLHXbdtVUiMHrJXOfnc23VkOPPacXZoDfgcijvg@mail.gmail.com>
"H.J. Lu via Binutils" <binutils@sourceware.org> writes:
> .
> w On Fri, Apr 29, 2022 at 6:48 AM Andrew Burgess via Binutils
> <binutils@sourceware.org> wrote:
>>
>> The i386 disassembler is pretty complex. Most disassembly is done
>> indirectly; operands are built into buffers within a struct instr_info
>> instance, before finally being printed later in the disassembly
>> process.
>>
>> Sometimes the operand buffers are built in a different order to the
>> order in which they will eventually be printed.
>>
>> Each operand can contain multiple components, e.g. multiple registers,
>> immediates, other textual elements (commas, brackets, etc).
>>
>> When looking for how to apply styling I guess the ideal solution would
>> be to move away from the operands being a single string that is built
>> up, and instead have each operand be a list of "parts", where each
>> part is some text and a style. Then, when we eventually print the
>> operand we would loop over the parts and print each part with the
>> correct style.
>>
>> But it feels like a huge amount of work to move from where we are
>> now to that potentially ideal solution. Plus, the above solution
>> would be pretty complex.
>>
>> So, instead I propose a .... different solution here, one that works
>> with the existing infrastructure.
>>
>> As each operand is built up, piece be piece, we pass through style
>> information. This style information is then encoded into the operand
>> buffer (see below for details). After this the code can continue to
>> operate as it does right now in order to manage the set of operand
>> buffers.
>>
>> Then, as each operand is printed we can split the operand buffer into
>> chunks at the style marker boundaries, with each chunk being printed
>> in the correct style.
>>
>> For encoding the style information I use the format "~%x~". As far as
>> I can tell the '~' is not otherwise used in the i386 disassembler, so
>
> Can you use a non-ascii character instead of ~?
>
>> this should serve as a unique marker. To speed up writing and then
>> reading the style markers, I take advantage of the fact that there are
>> less than 16 styles so I know the '%x' will only ever be a single hex
>> character.
>>
>> In some (not very scientific) benchmarking on my machine,
>> disassembling a reasonably large (142M) shared library, I'm not seeing
>> any significant slow down in disassembler speed with this change.
>>
>> Most instructions are now being fully syntax highlighted when I
>> disassemble using the --disassembler-color=extended-color option. I'm
>> sure that there are probably still a few corner cases that need fixing
>> up, but we can come back to them later I think.
>>
>> When disassembler syntax highlighting is not being used, then there
>> should be no user visible changes after this commit.
>> ---
>> opcodes/i386-dis.c | 571 ++++++++++++++++++++++++++-------------------
>> 1 file changed, 332 insertions(+), 239 deletions(-)
>>
>> diff --git a/opcodes/i386-dis.c b/opcodes/i386-dis.c
>> index 1e3266329c1..c94d316a03f 100644
>> --- a/opcodes/i386-dis.c
>> +++ b/opcodes/i386-dis.c
>> @@ -42,12 +42,14 @@
>> #include <setjmp.h>
>> typedef struct instr_info instr_info;
>>
>> +#define STYLE_BUFFER_SIZE 10
>> +
>> static int print_insn (bfd_vma, instr_info *);
>> static void dofloat (instr_info *, int);
>> static void OP_ST (instr_info *, int, int);
>> static void OP_STi (instr_info *, int, int);
>> static int putop (instr_info *, const char *, int);
>> -static void oappend (instr_info *, const char *);
>> +static void oappend (instr_info *, const char *, enum disassembler_style);
>
> Please add a new function, oappend_with_style, to take a new
> argument and change oappend to call oappend_with_style with
> dis_style_text.
>
>> static void append_seg (instr_info *);
>> static void OP_indirE (instr_info *, int, int);
>> static void print_operand_value (instr_info *, char *, int, bfd_vma);
>> @@ -166,6 +168,8 @@ struct instr_info
>> char *obufp;
>> char *mnemonicendp;
>> char scratchbuf[100];
>> + char style_buffer[STYLE_BUFFER_SIZE];
>> + char staging_area[100];
>> unsigned char *start_codep;
>> unsigned char *insn_codep;
>> unsigned char *codep;
>> @@ -248,6 +252,8 @@ struct instr_info
>>
>> enum x86_64_isa isa64;
>>
>> + int (*printf) (instr_info *ins, enum disassembler_style style,
>> + const char *fmt, ...) ATTRIBUTE_FPTR_PRINTF_3;
>> };
>>
>> /* Mark parts used in the REX prefix. When we are testing for
>> @@ -9300,9 +9306,73 @@ get_sib (instr_info *ins, int sizeflag)
>> /* Like oappend (below), but S is a string starting with '%'.
>> In Intel syntax, the '%' is elided. */
>> static void
>> -oappend_maybe_intel (instr_info *ins, const char *s)
>> +oappend_maybe_intel (instr_info *ins, const char *s,
>> + enum disassembler_style style)
>
> oappend_maybe_intel_wityh_style
>
>> {
>> - oappend (ins, s + ins->intel_syntax);
>> + oappend (ins, s + ins->intel_syntax, style);
>> +}
>> +
>> +/* Wrap around a call to INS->info->fprintf_styled_func, printing FMT.
>> + STYLE is the default style to use in the fprintf_styled_func calls,
>> + however, FMT might include embedded style markers (see oappend_style),
>> + these embedded markers are not printed, but instead change the style
>> + used in the next fprintf_styled_func call.
>> +
>> + Return non-zero to indicate the print call was a success. */
>> +
>> +static int ATTRIBUTE_PRINTF_3
>> +i386_dis_printf (instr_info *ins, enum disassembler_style style,
>> + const char *fmt, ...)
>> +{
>> + va_list ap;
>> + enum disassembler_style curr_style = style;
>> + char *start, *curr;
>> +
>> + va_start (ap, fmt);
>> + vsnprintf (ins->staging_area, 100, fmt, ap);
>> + va_end (ap);
>> +
>> + start = curr = ins->staging_area;
>> +
>> + do
>> + {
>> + if (*curr == '\0' || *curr == '~')
>> + {
>> + /* Output content between our START position and CURR. */
>> + int len = curr - start;
>> + (*ins->info->fprintf_styled_func) (ins->info->stream, curr_style,
>> + "%.*s", len, start);
>> + if (*curr == '\0')
>> + break;
>> +
>> + /* Update the CURR_STYLE, it is possible here that if the input
>> + is corrupted in some way, then we may set CURR_STYLE to an
>> + invalid value. Don't worry though, we check for that in a
>> + subsequent if block. */
>> + ++curr;
>> + if (*curr >= '0' && *curr <= '9')
>> + curr_style = (enum disassembler_style) (*curr - '0');
>> + else if (*curr >= 'a' && *curr <= 'f')
>> + curr_style = (enum disassembler_style) (*curr - 'a' + 10);
>> + else
>> + curr_style = dis_style_text;
>> +
>> + /* Skip over the hex character, and the closing '~'. Also
>> + validate that CURR_STYLE is set to a valid value. */
>> + ++curr;
>> + if (*curr != '~' || curr_style > dis_style_comment_start)
>> + curr_style = dis_style_text;
>> + ++curr;
>> +
>> + /* Reset the START and CURR pointers to after the style marker. */
>> + start = curr;
>> + }
>> + else
>> + ++curr;
>> + }
>> + while (true);
>> +
>> + return 1;
>> }
>>
>> static int
>> @@ -9317,6 +9387,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>> struct dis_private priv;
>> int prefix_length;
>>
>> + ins->printf = i386_dis_printf;
>> ins->isa64 = 0;
>> ins->intel_mnemonic = !SYSV386_COMPAT;
>> ins->op_is_jump = false;
>> @@ -9401,8 +9472,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>>
>> if (ins->address_mode == mode_64bit && sizeof (bfd_vma) < 8)
>> {
>> - (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
>> - _("64-bit address is disabled"));
>> + ins->printf (ins, dis_style_text, _("64-bit address is disabled"));
>> return -1;
>> }
>>
>> @@ -9451,16 +9521,14 @@ print_insn (bfd_vma pc, instr_info *ins)
>> {
>> name = prefix_name (ins, priv.the_buffer[0], priv.orig_sizeflag);
>> if (name != NULL)
>> - (*ins->info->fprintf_styled_func)
>> - (ins->info->stream, dis_style_mnemonic, "%s", name);
>> + ins->printf (ins, dis_style_mnemonic, "%s", name);
>> else
>> {
>> /* Just print the first byte as a .byte instruction. */
>> - (*ins->info->fprintf_styled_func)
>> - (ins->info->stream, dis_style_assembler_directive, ".byte ");
>> - (*ins->info->fprintf_styled_func)
>> - (ins->info->stream, dis_style_immediate, "0x%x",
>> - (unsigned int) priv.the_buffer[0]);
>> + ins->printf (ins, dis_style_assembler_directive,
>> + ".byte ");
>> + ins->printf (ins, dis_style_immediate, "0x%x",
>> + (unsigned int) priv.the_buffer[0]);
>> }
>>
>> return 1;
>> @@ -9478,10 +9546,9 @@ print_insn (bfd_vma pc, instr_info *ins)
>> for (i = 0;
>> i < (int) ARRAY_SIZE (ins->all_prefixes) && ins->all_prefixes[i];
>> i++)
>> - (*ins->info->fprintf_styled_func)
>> - (ins->info->stream, dis_style_mnemonic, "%s%s",
>> - (i == 0 ? "" : " "), prefix_name (ins, ins->all_prefixes[i],
>> - sizeflag));
>> + ins->printf (ins, dis_style_mnemonic, "%s%s",
>> + (i == 0 ? "" : " "),
>> + prefix_name (ins, ins->all_prefixes[i], sizeflag));
>> return i;
>> }
>>
>> @@ -9496,11 +9563,9 @@ print_insn (bfd_vma pc, instr_info *ins)
>> /* Handle ins->prefixes before fwait. */
>> for (i = 0; i < ins->fwait_prefix && ins->all_prefixes[i];
>> i++)
>> - (*ins->info->fprintf_styled_func)
>> - (ins->info->stream, dis_style_mnemonic, "%s ",
>> - prefix_name (ins, ins->all_prefixes[i], sizeflag));
>> - (*ins->info->fprintf_styled_func)
>> - (ins->info->stream, dis_style_mnemonic, "fwait");
>> + ins->printf (ins, dis_style_mnemonic, "%s ",
>> + prefix_name (ins, ins->all_prefixes[i], sizeflag));
>> + ins->printf (ins, dis_style_mnemonic, "fwait");
>> return i + 1;
>> }
>>
>> @@ -9569,14 +9634,15 @@ print_insn (bfd_vma pc, instr_info *ins)
>> /* Don't print {%k0}. */
>> if (ins->vex.mask_register_specifier)
>> {
>> - oappend (ins, "{");
>> + oappend (ins, "{", dis_style_text);
>> oappend_maybe_intel (ins,
>> att_names_mask
>> - [ins->vex.mask_register_specifier]);
>> - oappend (ins, "}");
>> + [ins->vex.mask_register_specifier],
>> + dis_style_text);
>> + oappend (ins, "}", dis_style_text);
>> }
>> if (ins->vex.zeroing)
>> - oappend (ins, "{z}");
>> + oappend (ins, "{z}", dis_style_text);
>>
>> /* S/G insns require a mask and don't allow
>> zeroing-masking. */
>> @@ -9584,7 +9650,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>> || dp->op[0].bytemode == vex_vsib_q_w_dq_mode)
>> && (ins->vex.mask_register_specifier == 0
>> || ins->vex.zeroing))
>> - oappend (ins, "/(bad)");
>> + oappend (ins, "/(bad)", dis_style_text);
>> }
>> }
>>
>> @@ -9598,8 +9664,8 @@ print_insn (bfd_vma pc, instr_info *ins)
>> ins->obufp = ins->op_out[i];
>> if (*ins->obufp)
>> continue;
>> - oappend (ins, names_rounding[ins->vex.ll]);
>> - oappend (ins, "bad}");
>> + oappend (ins, names_rounding[ins->vex.ll], dis_style_text);
>> + oappend (ins, "bad}", dis_style_text);
>> break;
>> }
>> }
>> @@ -9649,16 +9715,14 @@ print_insn (bfd_vma pc, instr_info *ins)
>> are all 0s in inverted form. */
>> if (ins->need_vex && ins->vex.register_specifier != 0)
>> {
>> - (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
>> - "(bad)");
>> + ins->printf (ins, dis_style_text, "(bad)");
>> return ins->end_codep - priv.the_buffer;
>> }
>>
>> /* If EVEX.z is set, there must be an actual mask register in use. */
>> if (ins->vex.zeroing && ins->vex.mask_register_specifier == 0)
>> {
>> - (*ins->info->fprintf_styled_func) (ins->info->stream, dis_style_text,
>> - "(bad)");
>> + ins->printf (ins, dis_style_text, "(bad)");
>> return ins->end_codep - priv.the_buffer;
>> }
>>
>> @@ -9669,8 +9733,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>> the encoding invalid. Most other PREFIX_OPCODE rules still apply. */
>> if (ins->need_vex ? !ins->vex.prefix : !(ins->prefixes & PREFIX_DATA))
>> {
>> - (*ins->info->fprintf_styled_func) (ins->info->stream,
>> - dis_style_text, "(bad)");
>> + ins->printf (ins, dis_style_text, "(bad)");
>> return ins->end_codep - priv.the_buffer;
>> }
>> ins->used_prefixes |= PREFIX_DATA;
>> @@ -9697,8 +9760,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>> || (ins->vex.evex && dp->prefix_requirement != PREFIX_DATA
>> && !ins->vex.w != !(ins->used_prefixes & PREFIX_DATA)))
>> {
>> - (*ins->info->fprintf_styled_func) (ins->info->stream,
>> - dis_style_text, "(bad)");
>> + ins->printf (ins, dis_style_text, "(bad)");
>> return ins->end_codep - priv.the_buffer;
>> }
>> break;
>> @@ -9748,24 +9810,28 @@ print_insn (bfd_vma pc, instr_info *ins)
>> if (name == NULL)
>> abort ();
>> prefix_length += strlen (name) + 1;
>> - (*ins->info->fprintf_styled_func)
>> - (ins->info->stream, dis_style_mnemonic, "%s ", name);
>> + ins->printf (ins, dis_style_mnemonic, "%s ", name);
>> }
>>
>> /* Check maximum code length. */
>> if ((ins->codep - ins->start_codep) > MAX_CODE_LENGTH)
>> {
>> - (*ins->info->fprintf_styled_func)
>> - (ins->info->stream, dis_style_text, "(bad)");
>> + ins->printf (ins, dis_style_text, "(bad)");
>> return MAX_CODE_LENGTH;
>> }
>>
>> - ins->obufp = ins->mnemonicendp;
>> - for (i = strlen (ins->obuf) + prefix_length; i < 6; i++)
>> - oappend (ins, " ");
>> - oappend (ins, " ");
>> - (*ins->info->fprintf_styled_func)
>> - (ins->info->stream, dis_style_mnemonic, "%s", ins->obuf);
>> + i = strlen (ins->obuf);
>> + if (ins->mnemonicendp == ins->obuf + i)
>> + {
>> + i += prefix_length;
>> + if (i < 6)
>> + i = 6 - i + 1;
>> + else
>> + i = 1;
>> + }
>> + else
>> + i = 0;
>> + ins->printf (ins, dis_style_mnemonic, "%s%*s", ins->obuf, i, "");
>>
>> /* The enter and bound instructions are printed with operands in the same
>> order as the intel book; everything else is printed in reverse order. */
>> @@ -9804,8 +9870,7 @@ print_insn (bfd_vma pc, instr_info *ins)
>> if (*op_txt[i])
>> {
>> if (needcomma)
>> - (*ins->info->fprintf_styled_func) (ins->info->stream,
>> - dis_style_text, ",");
>> + ins->printf (ins, dis_style_text, ",");
>> if (ins->op_index[i] != -1 && !ins->op_riprel[i])
>> {
>> bfd_vma target = (bfd_vma) ins->op_address[ins->op_index[i]];
>> @@ -9821,18 +9886,14 @@ print_insn (bfd_vma pc, instr_info *ins)
>> (*ins->info->print_address_func) (target, ins->info);
>> }
>> else
>> - (*ins->info->fprintf_styled_func) (ins->info->stream,
>> - dis_style_text, "%s",
>> - op_txt[i]);
>> + ins->printf (ins, dis_style_text, "%s", op_txt[i]);
>> needcomma = 1;
>> }
>>
>> for (i = 0; i < MAX_OPERANDS; i++)
>> if (ins->op_index[i] != -1 && ins->op_riprel[i])
>> {
>> - (*ins->info->fprintf_styled_func) (ins->info->stream,
>> - dis_style_comment_start,
>> - " # ");
>> + ins->printf (ins, dis_style_comment_start, " # ");
>> (*ins->info->print_address_func) ((bfd_vma)
>> (ins->start_pc + (ins->codep - ins->start_codep)
>> + ins->op_address[ins->op_index[i]]), ins->info);
>> @@ -10217,15 +10278,18 @@ static void
>> OP_ST (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>> int sizeflag ATTRIBUTE_UNUSED)
>> {
>> - oappend_maybe_intel (ins, "%st");
>> + oappend_maybe_intel (ins, "%st", dis_style_text);
>> }
>>
>> static void
>> OP_STi (instr_info *ins, int bytemode ATTRIBUTE_UNUSED,
>> int sizeflag ATTRIBUTE_UNUSED)
>> {
>> - sprintf (ins->scratchbuf, "%%st(%d)", ins->modrm.rm);
>> - oappend_maybe_intel (ins, ins->scratchbuf);
>> + oappend_maybe_intel (ins, "%st", dis_style_text);
>> + oappend (ins, "(", dis_style_text);
>> + sprintf (ins->scratchbuf, "%d", ins->modrm.rm);
>> + oappend (ins, ins->scratchbuf, dis_style_immediate);
>> + oappend (ins, ")", dis_style_text);
>> }
>>
>> /* Capital letters in template are macros. */
>> @@ -10329,7 +10393,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
>> if (!ins->vex.evex || ins->vex.w)
>> *ins->obufp++ = 'd';
>> else
>> - oappend (ins, "{bad}");
>> + oappend (ins, "{bad}", dis_style_text);
>> break;
>> default:
>> abort ();
>> @@ -10424,7 +10488,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
>> if (!ins->vex.w)
>> *ins->obufp++ = 'h';
>> else
>> - oappend (ins, "{bad}");
>> + oappend (ins, "{bad}", dis_style_text);
>> }
>> else
>> abort ();
>> @@ -10608,7 +10672,7 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
>> if (!ins->vex.evex || !ins->vex.w)
>> *ins->obufp++ = 's';
>> else
>> - oappend (ins, "{bad}");
>> + oappend (ins, "{bad}", dis_style_text);
>> break;
>> default:
>> abort ();
>> @@ -10772,12 +10836,47 @@ putop (instr_info *ins, const char *in_template, int sizeflag)
>> return 0;
>> }
>>
>> +/* Add a style marker "~X~" to *INS->obufp that encodes STYLE. This
>> + assumes that the buffer pointed to by INS->obufp has space. In the
>> + style marker 'X' is replaced with a single hex character that represents
>> + STYLE. */
>> +
>> +static void
>> +oappend_style (instr_info *ins, enum disassembler_style style)
>> +{
>> + int num = (int) style;
>> +
>> + /* We currently assume that STYLE can be encoded as a single hex
>> + character. If more styles are added then this might start to fail,
>> + and we'll need to expand this code. */
>> + if (num > 0xf)
>> + abort ();
>> +
>> + *ins->obufp++ = '~';
>> + *ins->obufp++ = (num < 10 ? ('0' + num)
>> + : ((num < 16) ? ('a' + (num - 10)) : '0'));
>> + *ins->obufp++ = '~';
>> + *ins->obufp = '\0';
>
> Do you need '\0'?
No, not really. I found having it in helpful for debug as it leaves the
buffer with a trailing null.
I've left this in, but added a comment explaining that it's not needed,
but maybe helpful - is that OK?
I've addressed all your other comments, except the use of '~' - Jan has
also asked about that, so I've followed up to Jan to get some more
direction before changing that part of the patch.
Thanks,
Andrew
next prev parent reply other threads:[~2022-05-03 15:07 UTC|newest]
Thread overview: 29+ messages / expand[flat|nested] mbox.gz Atom feed top
2022-04-29 13:42 [PATCH 0/2] Disassembler styling for i386-dis.c Andrew Burgess
2022-04-29 13:42 ` [PATCH 1/2] objdump: fix styled printing of addresses Andrew Burgess
2022-05-02 7:14 ` Jan Beulich
2022-05-03 9:52 ` Andrew Burgess
2022-04-29 13:42 ` [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler Andrew Burgess
2022-04-29 18:16 ` Vladimir Mezentsev
2022-05-03 13:15 ` Andrew Burgess
2022-04-29 18:57 ` H.J. Lu
2022-05-03 13:14 ` Andrew Burgess [this message]
2022-05-02 7:28 ` Jan Beulich
2022-05-03 13:12 ` Andrew Burgess
2022-05-03 15:47 ` H.J. Lu
2022-05-04 7:58 ` Jan Beulich
2022-05-09 9:48 ` Andrew Burgess
2022-05-09 12:54 ` [PATCHv2] " Andrew Burgess
2022-05-18 12:27 ` Jan Beulich
2022-05-26 12:48 ` Andrew Burgess
2022-05-18 21:23 ` H.J. Lu
2022-05-27 17:44 ` [PATCHv3] " Andrew Burgess
2022-05-30 8:19 ` Jan Beulich
2022-05-31 17:20 ` Andrew Burgess
2022-06-01 5:59 ` Jan Beulich
2022-06-01 15:56 ` H.J. Lu
2022-06-08 16:03 ` Andrew Burgess
2022-06-10 10:56 ` Jan Beulich
2022-06-10 13:01 ` Andrew Burgess
2022-05-18 7:06 ` [PATCH 2/2] " Jan Beulich
2022-05-18 10:41 ` Andrew Burgess
2022-05-18 10:46 ` Jan Beulich
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=87wnf2n4fw.fsf@redhat.com \
--to=aburgess@redhat.com \
--cc=binutils@sourceware.org \
--cc=hjl.tools@gmail.com \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).