Re: [PATCH 3/3] opcodes/i386: partially implement disassembler style support

public inbox for binutils@sourceware.org
 help / color / mirror / Atom feed

From: Andrew Burgess <aburgess@redhat.com>
To: Jan Beulich <jbeulich@suse.com>
Cc: binutils@sourceware.org
Subject: Re: [PATCH 3/3] opcodes/i386: partially implement disassembler style support
Date: Mon, 21 Feb 2022 18:01:14 +0000	[thread overview]
Message-ID: <8735kccd5h.fsf@redhat.com> (raw)
In-Reply-To: <6ad59383-d742-a217-a0e6-09d32fa4e900@suse.com>

Jan Beulich via Binutils <binutils@sourceware.org> writes:

> On 19.02.2022 11:54, Andrew Burgess wrote:
>> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>> 
>>> On 17.02.2022 23:37, Andrew Burgess wrote:
>>>> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>>>>> On 17.02.2022 17:15, Andrew Burgess wrote:
>>>>>> Jan Beulich via Binutils <binutils@sourceware.org> writes:
>>>>>>> On 16.02.2022 21:53, Andrew Burgess via Binutils wrote:
>>>>>>>> +	      (*ins->info->fprintf_styled_func)
>>>>>>>> +		(ins->info->stream, dis_style_text, " ");
>>>>>>>> +	      (*ins->info->fprintf_styled_func)
>>>>>>>> +		(ins->info->stream, dis_style_immediate, "0x%x",
>>>>>>>> +		 (unsigned int) priv.the_buffer[0]);
>>>>>>>
>>>>>>> I wonder if the naming (dis_style_immediate) isn't misleading. As per
>>>>>>> the comment next to its definition it really appears to mean any kind
>>>>>>> of number (like is the case here), not just immediate operands of
>>>>>>> instructions. Hence maybe dis_style_number (as replacement for or in
>>>>>>> addition to dis_style_immediate)?
>>>>>>
>>>>>> You mentioned this before in the previous thread, and I didn't really
>>>>>> understand then either.
>>>>>>
>>>>>> Can you give an example of something that's a number, but not an
>>>>>> immediate?  e.g. I wonder (given the instruction/directive distinction
>>>>>> you draw above), I wonder if you're conserned about: '.byte 0x4', maybe
>>>>>> you don't like referring to this 0x4 here as an immediate?
>>>>>
>>>>> Well, an operand to a directive for example is not an immediate imo,
>>>>> yes. A "load offset" (as your comment calls it) may also not be an
>>>>> immediate. E.g. in x86 memory access instructions:
>>>>>
>>>>> 	mov	0x10(%rbx), %eax
>>>>>
>>>>> the 0x10 isn't an immediate, but a displacement. The difference may
>>>>> be more relevant in something like
>>>>>
>>>>> 	mov	$0, 0x10(%rbx)
>>>>>
>>>>> where the $0 is an immediate operand, but the 0x10 isn't (and you
>>>>> wouldn't want to mix the two).
>>>>>
>>>>> From that comment it's not clear to me where else you would think
>>>>> "immediate" applies (or not), but in RISC-V's
>>>>>
>>>>> 	lw	x0, 0x10(x0)
>>>>>
>>>>> I wouldn't consider the 0x10 an immediate either, albeit this may
>>>>> be a result of my x86 bias.
>>>>
>>>> I wonder if there's a name we could come up with that would allow me to
>>>> classify the '$0' and '0x10' (in your example above) as the same style?
>>>>
>>>> I've kind-of lost the thread a bit, but maybe that's what the 'number'
>>>> you suggested original was for?  If I replaced dis_style_immediate with
>>>> dis_style_number, and just replaced thoughout, would that be less
>>>> problematic?
>>>
>>> Yes, "number" was meant to be a possible replacement. Whether it's
>>> helpful to style all forms of numbers the same is questionable
>>> though. Note that to me in "$0" the '$' would not be covered by
>>> "number" then, while it would be covered by "immediate".
>>>
>>>> Another possibility would be to have some aliases either in the original
>>>> enum, as in:
>>>>
>>>>   dis_style_displacement = dis_style_immediate,
>>>>
>>>> or even at the top of i386-dis.c, as in:
>>>>
>>>>   #define dis_style_displacement dis_style_immediate
>>>
>>> But that's wrong. I'm not primarily after the naming in the sources,
>>> but after the output not showing distinct things as similar.
>>>
>>>> I really think we should avoid adding too many distinct styles if we
>>>> can.  My concern is less about disassembler users handling the different
>>>> styles, and more about consistency between the disassemblers.  I figure
>>>> it's easier to be consistent if we only have a small number of styles.
>>>> If displacement is a different style to other immediate (yes, I'm still
>>>> going to call numbers in instruction immediates!) then we end up with
>>>> some architectures going one way, and others another.
>>>
>>> I agree with the goal of limiting the number of styles. But instead
>>> of marking distinct items with similar non-basic (text) styles, I'd
>>> rather see items left alone then. IOW with dis_style_immediate I'd
>>> prefer (in the x86 example) to see only true immediate insn operands
>>> be tagged that way, and all other numbers to remain dis_style_text.
>> 
>> Having reviewed the thread, I've tried to come up with the complete list
>> of styles that I believe your are arguing for.  These incude a new
>> directive style, and an address_offset style.  I've extended the text in
>> several places to cover how to handle prefixes, as well as how styles
>> should be used for directives.
>> 
>> I'd be grateful if you could read through this list, and give any
>> examples of things which, if styled using the rules below, you feel
>> would be unacceptable.
>
> I think this all looks good (but of course once this can actually
> be seen in use, more things may pop up), provided ...

For sure.  But I think this conversation has been really constructive to
address some early issues.

>
>>   /* This is the default style, use this for any additional syntax
>>      (e.g. commas between operands, brackets, etc), or just as a default if
>>      no other style seems appropriate.  */
>>   dis_style_text,
>> 
>>   /* Use this for all instruction mnemonics, or aliases for mnemonics.
>>      These should be things that correspond to real machine
>>      instructions.  */
>>   dis_style_mnemonic,
>> 
>>   /* For things that aren't real machine instructions, but rather
>>      assembler directives, e.g. .byte, etc.  */
>>   dis_style_assembler_directive,
>> 
>>   /* Use this for any register names.  This may or may-not include any
>>      register prefix, e.g. '$', '%', at the discretion of the target,
>>      though within each target the choice to include prefixes for not
>>      should be kept consistent.  If the prefix is not printed with this
>>      style, then dis_style_text should be used.  */
>>   dis_style_register,
>> 
>>   /* Use this for any constant values used within instructions or
>>      directives, unless the value is an absolute address, or an offset
>>      that will be added to an address (no matter where the address comes
>>      from) before use.  This style may, or may-not be used for any
>>      prefix to the immediate value, e.g. '$', at the discretion of the
>>      target, though within each target the choice to include these
>>      prefixes should be kept consistent.  */
>>   dis_style_immediate,
>> 
>>   /* The style for the numerical representation of an absolute address.
>>      Anything that is an address offset should use the immediate style.
>>      This style may, or may-not be used for any prefix to the immediate
>>      value, e.g. '$', at the discretion of the target, though within
>>      each target the choice to include these prefixes should be kept
>>      consistent.  */
>>   dis_style_address,
>> 
>>   /* The style for any constant value within an instruction or directive
>>      that represents an offset that will be added to an address before
>>      use.  This style may, or may-not be used for any prefix to the
>>      immediate value, e.g. '$', at the discretion of the target, though
>>      within each target the choice to include these prefixes should be
>>      kept consistent.  */
>>   dis_style_address_offset,
>
> ... it was more a copy-n-paste mistake to repeat the reference to $ etc
> in these latter two? Or is this to cover e.g. Arm prefixing numbers by
> # in a wider fashion than x86's use of $? In this case, using # as the
> example character may avoid some confusion.

I didn't have any particular architecture in mind, I just wanted to
maintain a symmetry with dis_style_immediate (that there might be a
prefix, and it could be given this style, if that's the preference).  As
you suggest, I'll update the text to use a different character so it
look less like a copy & paste error.

>
>>   /* The style for a symbol's name.  The numerical address of a symbol
>>      should use the address style above, this style is reserved for the
>>      name.  */
>>   dis_style_symbol,
>
> There may be a remaining ambiguity here: What is the intended style to
> be used for <symbol>+<offset>? Just dis_style_symbol or first
> dis_style_symbol and then dis_style_address_offset?

Yes, I'd expect a mix of dis_style_symbol and dis_style_address_offset
in this case.  I'll expand the comment to explicitly mention this case.

I'll merge these fixes locally, and wait to see if anyone else has any
feedback before I repost this series.

Thanks,
Andrew

next prev parent reply	other threads:[~2022-02-21 18:01 UTC|newest]

Thread overview: 21+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2022-02-16 20:53 [PATCH 0/3] disassembler syntax highlighting in objdump (via libopcodes) Andrew Burgess
2022-02-16 20:53 ` [PATCH 1/3] objdump/opcodes: add syntax highlighting to disassembler output Andrew Burgess
2022-02-28 15:54   ` Tom Tromey
2022-02-16 20:53 ` [PATCH 2/3] opcodes/riscv: implement style support in the disassembler Andrew Burgess
2022-02-19 10:24   ` Andrew Burgess
2022-02-16 20:53 ` [PATCH 3/3] opcodes/i386: partially implement disassembler style support Andrew Burgess
2022-02-17  9:35   ` Jan Beulich
2022-02-17 16:15     ` Andrew Burgess
2022-02-17 16:29       ` Jan Beulich
2022-02-17 22:37         ` Andrew Burgess
2022-02-18  7:14           ` Jan Beulich
2022-02-19 10:54             ` Andrew Burgess
2022-02-21 13:08               ` Jan Beulich
2022-02-21 18:01                 ` Andrew Burgess [this message]
2022-02-17  3:57 ` [PATCH 0/3] disassembler syntax highlighting in objdump (via libopcodes) Nelson Chu
2022-02-17 16:17   ` Andrew Burgess
2022-03-21 14:33 ` [PATCHv2 " Andrew Burgess
2022-03-21 14:33   ` [PATCHv2 1/3] objdump/opcodes: add syntax highlighting to disassembler output Andrew Burgess
2022-03-21 14:33   ` [PATCHv2 2/3] opcodes/riscv: implement style support in the disassembler Andrew Burgess
2022-03-21 14:33   ` [PATCHv2 3/3] opcodes/i386: partially implement disassembler style support Andrew Burgess
2022-03-24 17:08   ` [PATCHv2 0/3] disassembler syntax highlighting in objdump (via libopcodes) Nick Clifton

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=8735kccd5h.fsf@redhat.com \
    --to=aburgess@redhat.com \
    --cc=binutils@sourceware.org \
    --cc=jbeulich@suse.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).