From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.133.124]) by sourceware.org (Postfix) with ESMTPS id CBDB7385E009 for ; Mon, 21 Feb 2022 18:01:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org CBDB7385E009 Received: from mail-wr1-f71.google.com (mail-wr1-f71.google.com [209.85.221.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-110-gcnKRC_LPU6W1i9pgsQSeA-1; Mon, 21 Feb 2022 13:01:18 -0500 X-MC-Unique: gcnKRC_LPU6W1i9pgsQSeA-1 Received: by mail-wr1-f71.google.com with SMTP id q21-20020adfab15000000b001e57c9a342aso7729035wrc.6 for ; Mon, 21 Feb 2022 10:01:18 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=KHC8DTwOgnuRLvx8yH6cscxRLBKi3OpkualAMjnGZ6Y=; b=WJ7asJluPxTRVLOLFd79fqGNd7zniY6bF4Ox6fysjDeWOWZq8nKln95JtdkR2i4mBB tOuc24/bT+88fKNoRn7DTf6YwDUJbr64bAwUsQ37msiz4V37UXbV4oecDwtOYStXl2JK L6hHyfKQIbP3V3jSU8eStuO3z2F7Rt0q0TObUCmyUwBGADEXJDe5wrz8CeuywOidVF8b 1Rhw+03Y/h5D+WJXMXhJtUaWZdHWdu7NdCdRmLhfU6HiP1mctdSEJbnEtwn3l8/IlDwO L+jHp2qBbqEMoSYCUwS4/2S660cu4E9z1rTFOmHwsIkG9CPy+OJGaivY9JMWi1/FOV4C RFIA== X-Gm-Message-State: AOAM5316YsyKt968Rg+MuWlIuP1orl8PnNQKcUP+XJCG4lQBwdHnB53u vGyM8FaGeOD30BxLIrO7skc6x9u1EydSlaQE4lSFm57wV5p9SgX+3rrVYjtGZvG+ZmpfTUcQa0E mSe6+RU425vDc3xTt5g== X-Received: by 2002:a05:600c:5102:b0:37f:79b4:2e5c with SMTP id o2-20020a05600c510200b0037f79b42e5cmr150389wms.80.1645466476836; Mon, 21 Feb 2022 10:01:16 -0800 (PST) X-Google-Smtp-Source: ABdhPJxHzyKW2/Cy2tRs4poLaVj2pj+WqTgIgrxyFwnMn6xcWSjRno1m2aAsCaL0l50/AG69SAApYQ== X-Received: by 2002:a05:600c:5102:b0:37f:79b4:2e5c with SMTP id o2-20020a05600c510200b0037f79b42e5cmr150364wms.80.1645466476503; Mon, 21 Feb 2022 10:01:16 -0800 (PST) Received: from localhost (host86-169-131-29.range86-169.btcentralplus.com. [86.169.131.29]) by smtp.gmail.com with ESMTPSA id f18sm92565wmg.21.2022.02.21.10.01.15 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 21 Feb 2022 10:01:15 -0800 (PST) From: Andrew Burgess To: Jan Beulich Cc: binutils@sourceware.org Subject: Re: [PATCH 3/3] opcodes/i386: partially implement disassembler style support In-Reply-To: <6ad59383-d742-a217-a0e6-09d32fa4e900@suse.com> References: <11996f886e69218629abf81f8041269e8740a60e.1645043588.git.aburgess@redhat.com> <37ba6375-00cf-3041-ee79-21377558402c@suse.com> <87o835eaf9.fsf@redhat.com> <2723bccf-3591-d1a7-2c94-741fcd26b8e7@suse.com> <87bkz5dsqu.fsf@redhat.com> <15b09dbb-613f-a7fe-b58e-27317d0852dd@suse.com> <87sfsfceju.fsf@redhat.com> <6ad59383-d742-a217-a0e6-09d32fa4e900@suse.com> Date: Mon, 21 Feb 2022 18:01:14 +0000 Message-ID: <8735kccd5h.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-5.6 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: binutils@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Binutils mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Mon, 21 Feb 2022 18:01:21 -0000 Jan Beulich via Binutils writes: > On 19.02.2022 11:54, Andrew Burgess wrote: >> Jan Beulich via Binutils writes: >> >>> On 17.02.2022 23:37, Andrew Burgess wrote: >>>> Jan Beulich via Binutils writes: >>>>> On 17.02.2022 17:15, Andrew Burgess wrote: >>>>>> Jan Beulich via Binutils writes: >>>>>>> On 16.02.2022 21:53, Andrew Burgess via Binutils wrote: >>>>>>>> + (*ins->info->fprintf_styled_func) >>>>>>>> + (ins->info->stream, dis_style_text, " "); >>>>>>>> + (*ins->info->fprintf_styled_func) >>>>>>>> + (ins->info->stream, dis_style_immediate, "0x%x", >>>>>>>> + (unsigned int) priv.the_buffer[0]); >>>>>>> >>>>>>> I wonder if the naming (dis_style_immediate) isn't misleading. As per >>>>>>> the comment next to its definition it really appears to mean any kind >>>>>>> of number (like is the case here), not just immediate operands of >>>>>>> instructions. Hence maybe dis_style_number (as replacement for or in >>>>>>> addition to dis_style_immediate)? >>>>>> >>>>>> You mentioned this before in the previous thread, and I didn't really >>>>>> understand then either. >>>>>> >>>>>> Can you give an example of something that's a number, but not an >>>>>> immediate? e.g. I wonder (given the instruction/directive distinction >>>>>> you draw above), I wonder if you're conserned about: '.byte 0x4', maybe >>>>>> you don't like referring to this 0x4 here as an immediate? >>>>> >>>>> Well, an operand to a directive for example is not an immediate imo, >>>>> yes. A "load offset" (as your comment calls it) may also not be an >>>>> immediate. E.g. in x86 memory access instructions: >>>>> >>>>> mov 0x10(%rbx), %eax >>>>> >>>>> the 0x10 isn't an immediate, but a displacement. The difference may >>>>> be more relevant in something like >>>>> >>>>> mov $0, 0x10(%rbx) >>>>> >>>>> where the $0 is an immediate operand, but the 0x10 isn't (and you >>>>> wouldn't want to mix the two). >>>>> >>>>> From that comment it's not clear to me where else you would think >>>>> "immediate" applies (or not), but in RISC-V's >>>>> >>>>> lw x0, 0x10(x0) >>>>> >>>>> I wouldn't consider the 0x10 an immediate either, albeit this may >>>>> be a result of my x86 bias. >>>> >>>> I wonder if there's a name we could come up with that would allow me to >>>> classify the '$0' and '0x10' (in your example above) as the same style? >>>> >>>> I've kind-of lost the thread a bit, but maybe that's what the 'number' >>>> you suggested original was for? If I replaced dis_style_immediate with >>>> dis_style_number, and just replaced thoughout, would that be less >>>> problematic? >>> >>> Yes, "number" was meant to be a possible replacement. Whether it's >>> helpful to style all forms of numbers the same is questionable >>> though. Note that to me in "$0" the '$' would not be covered by >>> "number" then, while it would be covered by "immediate". >>> >>>> Another possibility would be to have some aliases either in the original >>>> enum, as in: >>>> >>>> dis_style_displacement = dis_style_immediate, >>>> >>>> or even at the top of i386-dis.c, as in: >>>> >>>> #define dis_style_displacement dis_style_immediate >>> >>> But that's wrong. I'm not primarily after the naming in the sources, >>> but after the output not showing distinct things as similar. >>> >>>> I really think we should avoid adding too many distinct styles if we >>>> can. My concern is less about disassembler users handling the different >>>> styles, and more about consistency between the disassemblers. I figure >>>> it's easier to be consistent if we only have a small number of styles. >>>> If displacement is a different style to other immediate (yes, I'm still >>>> going to call numbers in instruction immediates!) then we end up with >>>> some architectures going one way, and others another. >>> >>> I agree with the goal of limiting the number of styles. But instead >>> of marking distinct items with similar non-basic (text) styles, I'd >>> rather see items left alone then. IOW with dis_style_immediate I'd >>> prefer (in the x86 example) to see only true immediate insn operands >>> be tagged that way, and all other numbers to remain dis_style_text. >> >> Having reviewed the thread, I've tried to come up with the complete list >> of styles that I believe your are arguing for. These incude a new >> directive style, and an address_offset style. I've extended the text in >> several places to cover how to handle prefixes, as well as how styles >> should be used for directives. >> >> I'd be grateful if you could read through this list, and give any >> examples of things which, if styled using the rules below, you feel >> would be unacceptable. > > I think this all looks good (but of course once this can actually > be seen in use, more things may pop up), provided ... For sure. But I think this conversation has been really constructive to address some early issues. > >> /* This is the default style, use this for any additional syntax >> (e.g. commas between operands, brackets, etc), or just as a default if >> no other style seems appropriate. */ >> dis_style_text, >> >> /* Use this for all instruction mnemonics, or aliases for mnemonics. >> These should be things that correspond to real machine >> instructions. */ >> dis_style_mnemonic, >> >> /* For things that aren't real machine instructions, but rather >> assembler directives, e.g. .byte, etc. */ >> dis_style_assembler_directive, >> >> /* Use this for any register names. This may or may-not include any >> register prefix, e.g. '$', '%', at the discretion of the target, >> though within each target the choice to include prefixes for not >> should be kept consistent. If the prefix is not printed with this >> style, then dis_style_text should be used. */ >> dis_style_register, >> >> /* Use this for any constant values used within instructions or >> directives, unless the value is an absolute address, or an offset >> that will be added to an address (no matter where the address comes >> from) before use. This style may, or may-not be used for any >> prefix to the immediate value, e.g. '$', at the discretion of the >> target, though within each target the choice to include these >> prefixes should be kept consistent. */ >> dis_style_immediate, >> >> /* The style for the numerical representation of an absolute address. >> Anything that is an address offset should use the immediate style. >> This style may, or may-not be used for any prefix to the immediate >> value, e.g. '$', at the discretion of the target, though within >> each target the choice to include these prefixes should be kept >> consistent. */ >> dis_style_address, >> >> /* The style for any constant value within an instruction or directive >> that represents an offset that will be added to an address before >> use. This style may, or may-not be used for any prefix to the >> immediate value, e.g. '$', at the discretion of the target, though >> within each target the choice to include these prefixes should be >> kept consistent. */ >> dis_style_address_offset, > > ... it was more a copy-n-paste mistake to repeat the reference to $ etc > in these latter two? Or is this to cover e.g. Arm prefixing numbers by > # in a wider fashion than x86's use of $? In this case, using # as the > example character may avoid some confusion. I didn't have any particular architecture in mind, I just wanted to maintain a symmetry with dis_style_immediate (that there might be a prefix, and it could be given this style, if that's the preference). As you suggest, I'll update the text to use a different character so it look less like a copy & paste error. > >> /* The style for a symbol's name. The numerical address of a symbol >> should use the address style above, this style is reserved for the >> name. */ >> dis_style_symbol, > > There may be a remaining ambiguity here: What is the intended style to > be used for +? Just dis_style_symbol or first > dis_style_symbol and then dis_style_address_offset? Yes, I'd expect a mix of dis_style_symbol and dis_style_address_offset in this case. I'll expand the comment to explicitly mention this case. I'll merge these fixes locally, and wait to see if anyone else has any feedback before I repost this series. Thanks, Andrew