From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id 103CE385627D for ; Wed, 25 May 2022 10:37:12 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 103CE385627D Received: from mail-wr1-f72.google.com (mail-wr1-f72.google.com [209.85.221.72]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-625-EbVS_BvSObe3CDp0uhV24g-1; Wed, 25 May 2022 06:37:10 -0400 X-MC-Unique: EbVS_BvSObe3CDp0uhV24g-1 Received: by mail-wr1-f72.google.com with SMTP id c11-20020adffb4b000000b0020ff998391dso939419wrs.8 for ; Wed, 25 May 2022 03:37:10 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=F9qpZoZrOLEeB9e6rnjCHJnO383dmnGqw2vWyjc6DFQ=; b=3dTPHPmJXotLvQiJMLx+FJ34xRcI1vQCta7wQ0ufmoyl6NtviaEw7HTPReyP8edzR6 LyNk1/mSd0tTLcunBt8nbY/O5vH9rKybOpAZYt3E5PWADTzNiOQwC46AxNVmZ/NbMZDA pKjlUZKMqOvRaC7Tq8eRovM1N8xuH7b3oBUz1H4dNcWpPBE4/opMC/tgeyov7dklOV8C eyjrIAvC782Uy0jShfQG1lExDc8+XjC+6CRF0MaT9mMIQGNhEnvDcgg5lu4gNzIcjrQZ +j+yGJblAO22c5Xc0nsLo5JBt9Jhw7i0pome1dzk/BbHxkeQAB0SfSoRbykuVcPyHSRM gF4g== X-Gm-Message-State: AOAM530i7E+EVcLyNWypSUPFHe+ZkoHUy0qrJallURfXdW7y14KvNq1A ZfI3LyLW0l8C3SzQpuHfihNNI+/VcyR/Z5zaLWEyD91uQGjNNVoxt9DS0w1k4irnIfiuR6PjqE6 7Z0XlFwzkYFL1ogg8RyCvuA== X-Received: by 2002:a05:600c:600d:b0:397:3801:3787 with SMTP id az13-20020a05600c600d00b0039738013787mr7517216wmb.113.1653475029366; Wed, 25 May 2022 03:37:09 -0700 (PDT) X-Google-Smtp-Source: ABdhPJxdfQDC+IteiPqT9g8opJvg1bg0croUqRSLe9GLoqE0iZpT66vEigjwqnH9MkjUK3T3LuijHA== X-Received: by 2002:a05:600c:600d:b0:397:3801:3787 with SMTP id az13-20020a05600c600d00b0039738013787mr7517181wmb.113.1653475029004; Wed, 25 May 2022 03:37:09 -0700 (PDT) Received: from localhost (host109-154-214-95.range109-154.btcentralplus.com. [109.154.214.95]) by smtp.gmail.com with ESMTPSA id i7-20020a5d55c7000000b0020c5253d8d3sm1732739wrw.31.2022.05.25.03.37.07 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Wed, 25 May 2022 03:37:08 -0700 (PDT) From: Andrew Burgess To: Simon Marchi , gdb-patches@sourceware.org Cc: Andrew Burgess Subject: Re: [PATCHv4 3/5] gdb/python: implement the print_insn extension language hook In-Reply-To: <87ee0jib8d.fsf@redhat.com> References: <50911018-847c-e060-6517-395f9410c80c@simark.ca> <878rrfn8t0.fsf@redhat.com> <6570bb7a-be59-d5b4-f6b2-3e8503790f21@simark.ca> <87ee0jib8d.fsf@redhat.com> Date: Wed, 25 May 2022 11:37:06 +0100 Message-ID: <877d69j3u5.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-6.3 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Wed, 25 May 2022 10:37:14 -0000 Andrew Burgess writes: > Simon Marchi via Gdb-patches writes: > >> Sorry for the late reply, I'm a bit busy with work and personal life. >> I gave some answers below. I don't think I'll have time to do a >> thorough review of your v5. But I already raised any concerns I had, >> and I trust you for the rest. > > Understood. Thanks for all the time you've already put into reviewing > this series. > >> >>>> Doesn't really matter, but this could probably modify the string in >>>> the existing DisassemblerResult object, and then return it: >>>> >>>> result.string += "\t## Comment" >>>> return result >>>> >>>> But I'm fine with what you have, if you think it's clearer for an >>>> example. >>> >>> The problem is all the properties of DisassemblerResult are read-only. >>> Given it's pretty light-weight I didn't really see any problem just >>> creating a new one. >>> >>> I suspect that I might end up changing that in the future, but for now I >>> don't see any great need to allow for modifications right now. I figure >>> extending the API to allow modifications in the future is fine if/when >>> that becomes critical. >>> >>> Let me know if that's going to be a problem and I can get the setting >>> code added now. >> >> Ack, this is reasonable. >> >>>>> +disasmpy_builtin_disassemble (PyObject *self, PyObject *args, PyObject *kw) >>>>> +{ >>>>> + PyObject *info_obj, *memory_source_obj = nullptr; >>>>> + static const char *keywords[] = { "info", "memory_source", nullptr }; >>>>> + if (!gdb_PyArg_ParseTupleAndKeywords (args, kw, "O!|O", keywords, >>>>> + &disasm_info_object_type, &info_obj, >>>>> + &memory_source_obj)) >>>> >>>> I'm wondering, why is there a separate memory_source parameter when info >>>> already has a read_memory method, that could potentially be overriden by >>>> the user? >>> >>> Here's how the API would be used right now: >>> >>> class MyDisassembler(Disassembler): >>> def __init__(self, name): >>> super().__init__(name) >>> self.__info = None >>> >>> def __call__(self, info): >>> self.__info = info >>> result = gdb.disassembler.builtin_disassemble(info, self) >>> >>> def read_memory(self, length, offset): >>> return self.__info.read_memory(length, offset) >>> >>> This is obviosly pretty pointless, the memory source just calls the >>> standard read_memory routine so you'll get the same behaviour as if no >>> memory source was passed at all, but it shows how the API works. >>> >>> If we wanted to override the DisassembleInfo.read_memory routine we'd >>> do something like this: >>> >>> class MyInfo(DisassembleInfo): >>> def __init__(self,old_info): >>> super().__init__(old_info) >>> >>> def read_memory(self, length, offset): >>> return super().read_memory(length, offset) >>> >>> class MyDisassembler(Disassembler): >>> def __init__(self, name): >>> super().__init__(name) >>> >>> def __call__(self, info): >>> wrapped_info = MyInfo(info) >>> result = gdb.disassembler.builtin_disassemble(wrapped_info) >>> >>> What are your thoughts on that? I think that would be pretty easy to >>> implement if you feel its an improvement. >> >> It's been a bit too long since I've looked at this patch so I've lost >> context. I only remember thinking that instead of passing a >> memory_source, you could simply pass an info with a special read_memory >> implementation that reads from whatever you want. So I think >> essentially what you did in that last example. The only remember I >> would mention this is that it seems better to me to not have two ways of >> doing the same thing, as it's sometimes ambiguous what happens when they >> are both used, how they interact together. > > I think what happened is when I started writing the above I was planning > to argue that my "memory_source" approach was best, but as I sketched > out the sub-classing DisassembleInfo approach I realised that your idea > probably was better. > > So I updated my patch, and then failed to reword the above text. > > I think we're all in alignment now, the way to intercept memory writes > is to sub-class DisassembleInfo (like the second example above), so I > think this issue is resolved. > >> >>>> Is there a use case for other error kinds? For example, if the >>>> disassembler finds that the machine code does not encode a valid >>>> instruction, what should it do? >>>> >>>> Maybe this is more a question for the gdb.disassembler.Disassembler >>>> implementation side of the API. >>> >>> No! Like the comment says, everything should disassemble to something, >>> even if it's just ".byte xxxx". The libopcodes disassembler only >>> supports reporting one type of error, that's memory error. It's just >>> unfortunate libopcodes also uses the return value to indicate that an >>> error occurred, and in some cases disassemblers return -1 (to indicate >>> error) without setting a memory error. In these cases the disassembler >>> has probably written to the output stream what went wrong, but this >>> really is not how libopcodes is supposed to work. >>> >>> So, no. We either have a memory error, or an unknown error. Ideally, >>> given enough time, libopcodes will be fixed so that we _only_ ever emit >>> memory errors. >> >> Ok. Well, for an API that allows users to arbitrarily extend >> functionality, it makes sense to always allow returning a generic error >> of some kind. Perhaps libopcodes can't fail, but perhaps (a bit >> stretched example to illustrate) my disassembler makes a GET request to >> an HTTP server to do the actual disassembling. So it seems good to me >> to always have a "something went wrong" escape hatch. > > I'll take another pass as this aspect and check how throwing arbitrary > errors from Python code is presented to the user. I've reworked how exceptions are handled through the whole Python disassembler process now. If we consider a disassembler like this: class ExampleDisassembler(gdb.disassembler.Disassembler): class InfoWrapper(gdb.disassembler.DisassembleInfo): def __init__(self, info): super().__init__(info) def read_memory(self, length, offset): buffer = super().read_memory(length, offset) return buffer def __init__(self): super().__init__("ExampleDisassembler") def __call__(self, info): info = self.InfoWrapper(info) result = gdb.disassembler.builtin_disassemble(info) return result Then this will result in a call stack like this: gdbpy_print_insn (py-disasm.c) ExampleDisassembler.__call__ (user's Python code) disasmpy_builtin_disassemble (py-disasm.c) InfoWrapper.read_memory (user's Python code) We can split the exceptions into 3 types: 1. gdb.MemoryError, if this is raised in `read_memory` then the bultin disassembler might mask this exception, or it might choose to re-raise a gdb.MemoryError of its own. If the MemoryError is raised then this will be seen (and can be caught) in the ExampleDisassembler.__call__. If the MemoryError is not caught then gdbpy_print_insn will handle it, the output looks like: (gdb) disassemble 0x0000000000401194,0x0000000000401198 Dump of assembler code from 0x401194 to 0x401198: 0x0000000000401194 : Cannot access memory at address 0x401194 2. gdb.GdbError, if an instance of this exception is raised in `read_memory` then it will propagate back and be catchable in ExampleDisassembler.__call__. If the exception is not handled there then it will be caught and handled by gdbpy_print_insn. In keeping with how these exceptions are described in the `Exception Handling` section of the manual, these exceptions are not treated as errors in the Python code, but a mechanism for the user to report errors to the user. If one of these makes it to gdbpy_print_insn, this is what the output looks like: (gdb) disassemble 0x0000000000401194,0x0000000000401198 Dump of assembler code from 0x401194 to 0x401198: 0x0000000000401194 : this is a GdbError unknown disassembler error (error = -1) The whole "unknown disassembler error" is a consequence of the core GDB disassembler only having a mechanism to handle memory errors. Fixing this completely is not trivial though as I think a perfect solution would require updates to libopcodes. Still, we could potentially present this error to the user in a slightly better way, but I'd like to defer changes in that area until later - it doesn't impact the Python API at all, so we can polish that part later with no backwards compatibility worries I think. 3. Any other exception type. If any other exception type is raised from `read_memory` then, as with GdbError, the exception is propagated back, and can be caught in ExampleDisassembler.__call__. If the exception is not caught there then it is handled in gdbpy_print_insn, the output looks like this: (gdb) disassemble 0x0000000000401194,0x0000000000401198 Dump of assembler code from 0x401194 to 0x401198: 0x0000000000401194 : Python Exception : this is a RuntimeError unknown disassembler error (error = -1) Other exception types are handled as errors in the user code, and so its possible to get a stack-trace using 'set python print-stack full'. Other significant changes I've made: * After gdbpy_print_insn has called ExampleDisassembler.__call__, any errors found after this point will be reported as errors and stop the disassembler. Previously, the error would be reported, but the default disassembler would then be used. While working on the latest changes I decided that behaviour was not helpful, so removed it. * There are a few places where the libopcodes disassembler will return -1 (indicating an error), without first calling the memory_error callback. As far as I'm concerned, this is a bug in libopcodes. An example of this can be seen by looking for the string '64-bit address is disabled' in i386-dis.c. Previously, these cases were converted to gdb.MemoryError exceptions, however, this meant that there was a difference in behaviour between having no Python disassemblers, and a "default", pass-through disassembler written in Python. I now convert these cases into gdb.GdbError exceptions, with the payload of the exception being any text the disassembler has emitted so far. Combined with the updated handling of GdbError (described above), this means that a "default", pass-through disassembler, written in Python, gives the exact same output as GDB when no Python disassemblers are being used. I'll follow up shortly with an updated version of this series that includes all the fixes I describe above. I know you said you're likely too busy to review any further versions of this series, which is fine, I'll give the new version a little time for others to look at, and then I'll go ahead and merge it as I think we probably align on most of this stuff now. Thanks, Andrew