From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <simark@simark.ca>
Received: from simark.ca (simark.ca [158.69.221.121])
 by sourceware.org (Postfix) with ESMTPS id 9EA323857806
 for <gdb-patches@sourceware.org>; Tue, 24 May 2022 01:16:18 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 9EA323857806
Received: from [10.0.0.11] (192-222-157-6.qc.cable.ebox.net [192.222.157.6])
 (using TLSv1.3 with cipher TLS_AES_128_GCM_SHA256 (128/128 bits)
 key-exchange X25519 server-signature RSA-PSS (2048 bits))
 (No client certificate requested)
 by simark.ca (Postfix) with ESMTPSA id 7F5781E01D;
 Mon, 23 May 2022 21:16:17 -0400 (EDT)
Message-ID: <6570bb7a-be59-d5b4-f6b2-3e8503790f21@simark.ca>
Date: Mon, 23 May 2022 21:16:16 -0400
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
 Thunderbird/91.9.0
Subject: Re: [PATCHv4 3/5] gdb/python: implement the print_insn extension
 language hook
Content-Language: en-US
To: Andrew Burgess <aburgess@redhat.com>, gdb-patches@sourceware.org
Cc: Andrew Burgess <andrew.burgess@embecosm.com>
References: <cover.1649110564.git.aburgess@redhat.com>
 <cover.1650878049.git.aburgess@redhat.com>
 <ed415e31ec48c6cc6cc844aee9acbc41b6c917b4.1650878049.git.aburgess@redhat.com>
 <50911018-847c-e060-6517-395f9410c80c@simark.ca> <878rrfn8t0.fsf@redhat.com>
From: Simon Marchi <simark@simark.ca>
In-Reply-To: <878rrfn8t0.fsf@redhat.com>
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit
X-Spam-Status: No, score=-7.1 required=5.0 tests=BAYES_00, DKIM_SIGNED,
 DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, NICE_REPLY_A, SPF_HELO_PASS,
 SPF_PASS, TXREP,
 T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6
X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on
 server2.sourceware.org
X-BeenThere: gdb-patches@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Gdb-patches mailing list <gdb-patches.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/gdb-patches/>
List-Post: <mailto:gdb-patches@sourceware.org>
List-Help: <mailto:gdb-patches-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/gdb-patches>,
 <mailto:gdb-patches-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Tue, 24 May 2022 01:16:20 -0000

Sorry for the late reply, I'm a bit busy with work and personal life.
I gave some answers below.  I don't think I'll have time to do a
thorough review of your v5.  But I already raised any concerns I had,
and I trust you for the rest.

>> Doesn't really matter, but this could probably modify the string in
>> the existing DisassemblerResult object, and then return it:
>>
>>   result.string += "\t## Comment"
>>   return result
>>
>> But I'm fine with what you have, if you think it's clearer for an
>> example.
> 
> The problem is all the properties of DisassemblerResult are read-only.
> Given it's pretty light-weight I didn't really see any problem just
> creating a new one.
> 
> I suspect that I might end up changing that in the future, but for now I
> don't see any great need to allow for modifications right now.  I figure
> extending the API to allow modifications in the future is fine if/when
> that becomes critical.
> 
> Let me know if that's going to be a problem and I can get the setting
> code added now.

Ack, this is reasonable.

>>> +disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw)
>>> +{
>>> +  PyObject *info_obj, *memory_source_obj = nullptr;
>>> +  static const char *keywords[] = { "info", "memory_source", nullptr };
>>> +  if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords,
>>> +					&disasm_info_object_type, &info_obj,
>>> +					&memory_source_obj))
>>
>> I'm wondering, why is there a separate memory_source parameter when info
>> already has a read_memory method, that could potentially be overriden by
>> the user?
> 
> Here's how the API would be used right now:
> 
>   class MyDisassembler(Disassembler):
>     def __init__(self, name):
>       super().__init__(name)
>       self.__info = None
> 
>     def __call__(self, info):
>       self.__info = info
>       result = gdb.disassembler.builtin_disassemble(info, self)
> 
>     def read_memory(self, length, offset):
>       return self.__info.read_memory(length, offset)
> 
> This is obviosly pretty pointless, the memory source just calls the
> standard read_memory routine so you'll get the same behaviour as if no
> memory source was passed at all, but it shows how the API works.
> 
> If we wanted to override the DisassembleInfo.read_memory routine we'd
> do something like this:
> 
>   class MyInfo(DisassembleInfo):
>     def __init__(self,old_info):
>       super().__init__(old_info)
> 
>     def read_memory(self, length, offset):
>       return super().read_memory(length, offset)
> 
>   class MyDisassembler(Disassembler):
>     def __init__(self, name):
>       super().__init__(name)
> 
>     def __call__(self, info):
>       wrapped_info = MyInfo(info)
>       result = gdb.disassembler.builtin_disassemble(wrapped_info)
> 
> What are your thoughts on that?  I think that would be pretty easy to
> implement if you feel its an improvement.

It's been a bit too long since I've looked at this patch so I've lost
context.  I only remember thinking that instead of passing a
memory_source, you could simply pass an info with a special read_memory
implementation that reads from whatever you want.  So I think
essentially what you did in that last example.  The only remember I
would mention this is that it seems better to me to not have two ways of
doing the same thing, as it's sometimes ambiguous what happens when they
are both used, how they interact together.

>> Is there a use case for other error kinds?  For example, if the
>> disassembler finds that the machine code does not encode a valid
>> instruction, what should it do?
>>
>> Maybe this is more a question for the gdb.disassembler.Disassembler
>> implementation side of the API.
> 
> No!  Like the comment says, everything should disassemble to something,
> even if it's just ".byte xxxx".  The libopcodes disassembler only
> supports reporting one type of error, that's memory error.  It's just
> unfortunate libopcodes also uses the return value to indicate that an
> error occurred, and in some cases disassemblers return -1 (to indicate
> error) without setting a memory error.  In these cases the disassembler
> has probably written to the output stream what went wrong, but this
> really is not how libopcodes is supposed to work.
> 
> So, no.  We either have a memory error, or an unknown error.  Ideally,
> given enough time, libopcodes will be fixed so that we _only_ ever emit
> memory errors.

Ok.  Well, for an API that allows users to arbitrarily extend
functionality, it makes sense to always allow returning a generic error
of some kind.  Perhaps libopcodes can't fail, but perhaps (a bit
stretched example to illustrate) my disassembler makes a GET request to
an HTTP server to do the actual disassembling.  So it seems good to me
to always have a "something went wrong" escape hatch.

>> This makes me think, is there a way for a Python user to call into the
>> disassembler?  Should the DisassembleInfo object have a user-callable
>> constructor, should the user want to construct one?
>>
>> I could imagine you could do this out of nowhere:
>>
>>   gdb.disassembler.builtin_disassemble(DisassembleInfo(addr, arch, progspace))
> 
> No! Don't do that.
> 
> We already have gdb.Architecture.disassemble which provides access to
> the disassembler.  You might feel that method is misplaced on
> Architecture (but that wasn't me!), but it is what it is.
> 
> I think if you are writing some random piece of Python code then you
> should not be worrying about Python disassemblers vs builtin
> disassembler; you should just call gdb.Architecture.disassemble and let
> GDB invoke the "correct" disassembler for you.
> 
> Preventing direct calls to gdb.disassembler.builtin_disassemble is one
> of the main reasons that I deliberately don't provide a user callable
> constructor for DisassembleInfo, during development I did have that
> method at one point, and removed it precisely to prevent the above! 

Ack.

>> But that would skip the Python disassemblers, so a user could also want
>> to call this function that doesn't exist today:
>>
>>   gdb.disassemble(DisassembleInfo(addr, arch, progspace))
> 
> I think having:
> 
>   gdb.disassemble(start_address, end_address, architecture, program_space)
> 
> would be better than the current disassemble method on Architecture.
> Thinking about what that does I suspect that I might end up having to
> work on Architecture.disassemble at some point in the future, so I might
> add a top-level gdb.disassemble and make the existing architecture
> method forward to that one.  We'll see.

I think that would make sense.

>> Since it's kind of expected that _print_insn is there, should this be a
>> gdb_assert?  Just returning silently here makes it more difficult to
>> investigate problems, IMO.  The only reason for the assert to trigger
>> would be if someone messed with the GDB Python modules, which I think is
>> ok.
> 
> I've not gone with an assert, but I did rewrite this code so now the
> user will get an error if _print_insn is not present.  I did that by
> removing the HasAttrString check here, and then...
> 
>>
>>> +
>>> +  /* Now grab the callback attribute from the module.  */
>>> +  gdbpy_ref<> hook
>>> +    (PyObject_GetAttrString (gdb_python_disassembler_module.get (),
>>> +			     callback_name));
>>> +  if (hook == nullptr)
>>
>> This can't be true, since you already checked with
>> PyObject_HasAttrString.
> 
> ... this check is now useful, the GetAttrString will fail if _print_insn
> is not present, and the PyErr will be set.

Ok.

Simon