public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed
From: "H.J. Lu" <hjl.tools@gmail.com>
To: Andrew Burgess <aburgess@redhat.com>
Cc: Jan Beulich <jbeulich@suse.com>, binutils@sourceware.org
Subject: Re: [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler
Date: Tue, 3 May 2022 08:47:22 -0700	[thread overview]
Message-ID: <CAMe9rOp7dFRBCgx-JmNUWwYOtt9kxvtDrMrY66L+UL-Sy5=N1w@mail.gmail.com> (raw)
In-Reply-To: <87zgjyn4k1.fsf@redhat.com>

On Tue, May 3, 2022 at 6:14 AM Andrew Burgess via Binutils
<binutils@sourceware.org> wrote:
>
> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>
> > On 29.04.2022 15:42, Andrew Burgess via Binutils wrote:
> >> The i386 disassembler is pretty complex.  Most disassembly is done
> >> indirectly; operands are built into buffers within a struct instr_info
> >> instance, before finally being printed later in the disassembly
> >> process.
> >>
> >> Sometimes the operand buffers are built in a different order to the
> >> order in which they will eventually be printed.
> >>
> >> Each operand can contain multiple components, e.g. multiple registers,
> >> immediates, other textual elements (commas, brackets, etc).
> >>
> >> When looking for how to apply styling I guess the ideal solution would
> >> be to move away from the operands being a single string that is built
> >> up, and instead have each operand be a list of "parts", where each
> >> part is some text and a style.  Then, when we eventually print the
> >> operand we would loop over the parts and print each part with the
> >> correct style.
> >>
> >> But it feels like a huge amount of work to move from where we are
> >> now to that potentially ideal solution.  Plus, the above solution
> >> would be pretty complex.
> >>
> >> So, instead I propose a .... different solution here, one that works
> >> with the existing infrastructure.
> >>
> >> As each operand is built up, piece be piece, we pass through style
> >> information.  This style information is then encoded into the operand
> >> buffer (see below for details).  After this the code can continue to
> >> operate as it does right now in order to manage the set of operand
> >> buffers.
> >>
> >> Then, as each operand is printed we can split the operand buffer into
> >> chunks at the style marker boundaries, with each chunk being printed
> >> in the correct style.
> >>
> >> For encoding the style information I use the format "~%x~".  As far as
> >> I can tell the '~' is not otherwise used in the i386 disassembler, so
> >> this should serve as a unique marker.  To speed up writing and then
> >> reading the style markers, I take advantage of the fact that there are
> >> less than 16 styles so I know the '%x' will only ever be a single hex
> >> character.
> >
> > Like H.J. I'd like to ask that you avoid ~ here (I actually have plans
> > to use it to make at least some 64-bit constants better recognizable);
> > I'm not sure about using non-ASCII though, as that may cause issues with
> > compilers treating non-ASCII wrong. I'd soften this to non-alnum, non-
> > operator characters (perhaps more generally non-printable). Otoh I guess
> > about _any_ character could be used in symbol names, so I'm not
> > convinced such an escaping model can be generally conflict free.
>
> Hi Jan,
>
> I've addressed all the simple feedback from H.J. and Vladimir, and I
> just need to figure out something for the escaping mechanism.
>
> I'm still keen to try and go with an escaping based solution, my
> reasoning is that I think that this is the solution least likely to
> introduce latent disassembler bugs.
>
> However, that position is based on my belief that there's no exhaustive
> test for the i386 based disassembler, i.e. one that tests every single
> valid instruction disassembles correctly.  If there was such a test then
> I might be more tempted to try something more radical...
>
> That said, if I was going to stick with an escaping scheme, then I have
> some ideas for moving forward.
>
> The current scheme relies on the fact that symbols are not printed
> directly from the i386 disassembler, instead the i386 disassembler calls
> back into the driver application (objdump, gdb) to print the symbol.  As
> a result, symbols don't go through the instr_info::obuf buffer.  This
> means that we never try to interpret a symbol name for escape
> characters.
>
> This means we avoid one of the issues that you raised, what if the
> escape character appears in a symbol name; the answer is, I just don't
> need to worry about this!
>
> So, I only need to ensure that the escape character is:
>
>   (a) not a character that the disassembler currently tries to directly
>   print itself, and
>
>   (b) not something that will ever be printed as part of an immediate.
>
> Clearly my choice passes both right now, but looks like it will not pass
> (b) forever.
>
> One possible solution would be to replace all the remaining places where
> we directly write to instr_info::obuf with calls to oappend_char.  I
> could then extend the oappend API such that we do "real" escaping, that
> is (assuming the continued use of '~' for now): '~X' would indicate a
> style marker, with X being the style number, and '~~' would indicate a
> literal '~' character.  In this was we really wouldn't care which
> character we used (though we'd probably pick one that didn't crop up too
> ofter just for ease of parsing the buffers).
>
> An alternative solution would be to pick a non-printable character,
> e.g. \001, and use this as the escape character in place of the current
> '~'.  This seems to pass the (a) and (b) tests above, and if such a
> character does ever appear in a symbol name, then, as I've said above, I
> don't believe this would cause us any problems.

I like \001.   We can always change it later.  Let's wait for input from Jan.

> Here's a session that demonstrates a symbol containing '~' with the
> current patch (obviously the final disassembler call actually has
> colour in the output, which all looks correct to me):
>
>   $ cat /tmp/weird.s
>           .text
>           .global "foo~bar"
>           "foo~bar":
>           nop
>           nop
>           nop
>           call       "foo~bar"
>   $ as -o /tmp/weird.o /tmp/weird.s
>   $ ./binutils/objdump --disassembler-color=extended-color -d /tmp/weird.o
>
>   /tmp/weird.o:     file format elf64-x86-64
>
>
>   Disassembly of section .text:
>
>   0000000000000000 <foo~bar>:
>      0: 90                      nop
>      1: 90                      nop
>      2: 90                      nop
>      3: e8 00 00 00 00          call   8 <foo~bar+0x8>
>
>
> Thanks,
> Andrew
>


-- 
H.J.

  reply	other threads:[~2022-05-03 15:47 UTC|newest]

Thread overview: 29+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-04-29 13:42 [PATCH 0/2] Disassembler styling for i386-dis.c Andrew Burgess
2022-04-29 13:42 ` [PATCH 1/2] objdump: fix styled printing of addresses Andrew Burgess
2022-05-02  7:14   ` Jan Beulich
2022-05-03  9:52     ` Andrew Burgess
2022-04-29 13:42 ` [PATCH 2/2] libopcodes: extend the styling within the i386 disassembler Andrew Burgess
2022-04-29 18:16   ` Vladimir Mezentsev
2022-05-03 13:15     ` Andrew Burgess
2022-04-29 18:57   ` H.J. Lu
2022-05-03 13:14     ` Andrew Burgess
2022-05-02  7:28   ` Jan Beulich
2022-05-03 13:12     ` Andrew Burgess
2022-05-03 15:47       ` H.J. Lu [this message]
2022-05-04  7:58       ` Jan Beulich
2022-05-09  9:48         ` Andrew Burgess
2022-05-09 12:54           ` [PATCHv2] " Andrew Burgess
2022-05-18 12:27             ` Jan Beulich
2022-05-26 12:48               ` Andrew Burgess
2022-05-18 21:23             ` H.J. Lu
2022-05-27 17:44             ` [PATCHv3] " Andrew Burgess
2022-05-30  8:19               ` Jan Beulich
2022-05-31 17:20                 ` Andrew Burgess
2022-06-01  5:59                   ` Jan Beulich
2022-06-01 15:56                     ` H.J. Lu
2022-06-08 16:03                       ` Andrew Burgess
2022-06-10 10:56               ` Jan Beulich
2022-06-10 13:01                 ` Andrew Burgess
2022-05-18  7:06           ` [PATCH 2/2] " Jan Beulich
2022-05-18 10:41             ` Andrew Burgess
2022-05-18 10:46               ` Jan Beulich

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to='CAMe9rOp7dFRBCgx-JmNUWwYOtt9kxvtDrMrY66L+UL-Sy5=N1w@mail.gmail.com' \
    --to=hjl.tools@gmail.com \
    --cc=aburgess@redhat.com \
    --cc=binutils@sourceware.org \
    --cc=jbeulich@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).