From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-74.mimecast.com (us-smtp-delivery-74.mimecast.com [170.10.133.74]) by sourceware.org (Postfix) with ESMTPS id 13AC63858D3C for ; Thu, 5 May 2022 17:39:56 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 13AC63858D3C Received: from mail-ed1-f69.google.com (mail-ed1-f69.google.com [209.85.208.69]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.2, cipher=TLS_ECDHE_RSA_WITH_AES_256_GCM_SHA384) id us-mta-584-x-NbGBe_N5q5H1a9Ytc4-w-1; Thu, 05 May 2022 13:39:54 -0400 X-MC-Unique: x-NbGBe_N5q5H1a9Ytc4-w-1 Received: by mail-ed1-f69.google.com with SMTP id z20-20020a50f154000000b0042815e3008cso2652560edl.15 for ; Thu, 05 May 2022 10:39:54 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:from:to:cc:subject:in-reply-to:references:date :message-id:mime-version; bh=gBoru3YZzrIeMMHqfl+FpD/32JKIVf93F6Bd+HHW/C8=; b=3Tyv5LHdsVuPEGB1n1oXKcX6P47fi1h4dVG7Nlicpt/3e+anMU3QVdcXNq5+W0A8KT QGQ8ZaqERYMQMET+CtxiOrg0+3S3Ton/DZYCUrowQ45lhE3m7JxF2KJkW2sWvNdWDZzv qHFC0hLpY4WsfQicHhIrztYctS80N4pkmmNmPsvGSisKbhoiDz63/lJ2eSExpRB+lHlX fROq4IPstlz9NqtUliuPj1Ooxk8F/ELC28HeVoxqApTXOkJM+j7ksGFM4FU3YgoZ0xen PFXx6BHe9IqbAzvKA31M8FGK+nUy+GYouJ3fMAhaLrzkDhftKeIcPXZnK5G86/j+7nca ndIw== X-Gm-Message-State: AOAM530eY4/N9A+pivEfrqkNeyrHFPxC6H3/QbaZlJHyaB1H+uVrDWJt 1slBsTgoYjK6pnuJ4loO0IWYHDS703jmAO20rgtxbnrv9hv0QOPUIqb3Mnd6gcUSomnZqSlnNBR KRyFWUCSwWKFuHqLWGqCrPQ== X-Received: by 2002:a17:906:854b:b0:6f4:58a7:5a07 with SMTP id h11-20020a170906854b00b006f458a75a07mr19627509ejy.440.1651772392941; Thu, 05 May 2022 10:39:52 -0700 (PDT) X-Google-Smtp-Source: ABdhPJyBvFsZBn1HsN6WHkjWX9Nvnc4BpCCnlnwSkAAo6dc7MnRo2gyYM3GSOGjQ7L/9+pPC6NHJ+g== X-Received: by 2002:a17:906:854b:b0:6f4:58a7:5a07 with SMTP id h11-20020a170906854b00b006f458a75a07mr19627479ejy.440.1651772392600; Thu, 05 May 2022 10:39:52 -0700 (PDT) Received: from localhost (92.40.179.200.threembb.co.uk. [92.40.179.200]) by smtp.gmail.com with ESMTPSA id j37-20020a05640223a500b0042617ba638bsm1081122eda.21.2022.05.05.10.39.51 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Thu, 05 May 2022 10:39:51 -0700 (PDT) From: Andrew Burgess To: Simon Marchi , gdb-patches@sourceware.org Cc: Andrew Burgess Subject: Re: [PATCHv4 1/5] gdb: add new base class to gdb_disassembler In-Reply-To: <0a8f29f9-a8f9-e178-6c89-90ff38be7596@simark.ca> References: <603bc5660c36f0c8c8aaf9248d5f652428b78832.1650878049.git.aburgess@redhat.com> <0a8f29f9-a8f9-e178-6c89-90ff38be7596@simark.ca> Date: Thu, 05 May 2022 18:39:50 +0100 Message-ID: <87czgrnajd.fsf@redhat.com> MIME-Version: 1.0 X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain X-Spam-Status: No, score=-5.0 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_NONE, SPF_HELO_NONE, SPF_NONE, TXREP, T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: gdb-patches@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Gdb-patches mailing list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Thu, 05 May 2022 17:39:58 -0000 Simon Marchi writes: >> From: Andrew Burgess >> >> The motivation for this change is an upcoming Python disassembler API >> that I would like to add. As part of that change I need to create a >> new disassembler like class that contains a disassemble_info and a >> gdbarch. The management of these two objects is identical to how we >> manage these objects within gdb_disassembler, so it might be tempting >> for my new class to inherit from gdb_disassembler. >> >> The problem however, is that gdb_disassembler has a tight connection >> between its constructor, and its print_insn method. In the >> constructor the ui_file* that is passed in is replaced with a member >> variable string_file*, and then in print_insn, the contents of the >> member variable string_file are printed to the original ui_file*. >> >> What this means is that the gdb_disassembler class has a tight >> coupling between its constructor and print_insn; the class just isn't >> intended to be used in a situation where print_insn is not going to be >> called, which is how my (upcoming) sub-class would need to operate. >> >> My solution then, is to separate out the management of the >> disassemble_info and gdbarch into a new gdb_disassemble_info class, >> and make this class a parent of gdb_disassembler. >> >> In arm-tdep.c and mips-tdep.c, where we used to cast the >> disassemble_info->application_data to a gdb_disassembler, we can now >> cast to a gdb_disassemble_info as we only need to access the gdbarch >> information. >> >> Now, my new Python disassembler sub-class will still want to print >> things to an output stream, and so we will want access to the >> dis_asm_fprintf functionality for printing. >> >> However, rather than move this printing code into the >> gdb_disassemble_info base class, I have added yet another level of >> hierarchy, a gdb_printing_disassembler, thus the class structure is >> now: >> >> struct gdb_disassemble_info {}; >> struct gdb_printing_disassembler : public gdb_disassemble_info {}; >> struct gdb_disassembler : public gdb_printing_disassembler {}; > > I can't explain it very well, but it seems a little strange to make the > other classes inherit from gdb_disassemble_info. From the name (and the > code), I understand that gdb_disassemble_info purely holds the data > necessary for the disassemblers to do their job. So this is not really > an IS-A relationship, but more a HAS-A. So I would think composition > would be more natural (i.e. gdb_printing_disassembler would have a field > of type gdb_disassemble_info). > > But with the callbacks we pass to struct disassemble_info (such as > fprintf_func and fprintf_styled_func), what is data and what is behavior > gets a little blurry. > > It doesn't matter much in the end, I'm fine with what you have. And it > can always be refactored later. Thanks. I took another look at this code and started by simplifying things so that we had just gdb_disassemble_info and gdb_disassembler. Instead of having gdb_disassembler inherit from gdb_disassemble_info, gdb_disassembler contained a gdb_disassemble_info. Then I pulled the rest of the patches in on top and tried to get things working again. In the end I liked the new code a lot less that what I have here. The problem I ran into is that it is the gdb_disassemble_info that is held in the disassemble_info::application_data field. It is important (I think) that this is the case, as we want code like gdb_print_insn_arm (arm-tdep.c) to be able to make use of the application_data without knowing precisely which disassembler class triggered the disassembly call, e.g. we could be calling from a Python disassembler (after patch #3 in this series), or from the "normal" disassembler (e.g. after an x/i command), or we could be calling from the gdb_disassembler_test disassembler (from the selftest files). In all cases the application_data is always a sub-class of gdb_disassemble_info. So, when gdb_disassembler no longer inherits from the base type, but instead contains a base type then what do we place in the application_data field? I think the only choice is to still place a gdb_disassemble_info in that field, but this means that we end up needing a mechanism to get from the application_data back to the actual disassembler instance, here's why: Currently my proposal is something like this (after patch #3): struct gdb_disassemble_info struct gdb_printing_disassembler : public gdb_disassemble_info struct gdb_disassembler : public gdb_printing_disassembler struct gdbpy_disassembler : public gdb_printing_disassembler Each of these types are sub-classes of gdb_disassemble_info, and so, callbacks like gdb_print_insn_arm are fine to cast the application_data to the gdb_disassemble_info type and make use of it. But then we have other callbacks, e.g. gdb_disassembler::dis_asm_print_address and (from patch #3) gdbpy_disassembler::read_memory_func, these callbacks are disassembler specific. In these we don't cast the application_data to gdb_disassemble_info, but to the specific disassembler type, gdb_disassembler and gdbpy_disassembler in the two examples I listed above. But there's another example in gdb_disassembler_test::read_memory and ther might be more about. The neat thing (I think) about using inheritance here is that this all "just works", a callback defined within a particular disassembler can assume that disassembler type. A more generic callback can just assume the generic gdb_disassemble_info type. So after I tried restructuring things like this: struct gdb_disassemble_info struct gdb_disassembler struct gdbpy_disassembler Where both gdb_disassembler and gdbpy_disassembler contain a gdb_disassemble_info. I still ended up placing the gdb_disassemble_info into the application_data field. However, now in the disassembler specific callbacks how do I get back to the correct disassembler type? I could give the gdb_disassemble_info a generic void* field, and then place any pointer I want in there, but, I really don't want to do that. Passing generic pointers around is almost never necessary (I claim) when we have inheritance. So what I did instead was this: struct gdb_disassemble_info struct gdb_disassembler struct gdbpy_disassemble_info : public gdb_disassemble_info struct gdbpy_disassembler Now gdb_disassembler contains a gdb_disassemble_info, and gdbpy_disassembler contains a gdbpy_disassemble_info. The callback problems is solved; generic callbacks can still treat the application_data as a gdb_disassemble_info, but the disassembler specific callbacks will know it's a particular sub-class of gdb_disassemble_info (e.g. gdbpy_disassemble_info), and can cast the application_data to that type. The gdbpy_disassemble_info then holds a pointer to the gdbpy_disassembler, and all is good. Of course, the above is only a sub-set of what's needed, the disassembler self-tests end up needing another gdb_disassemble_info sub-class, and maybe there would be more needed if I'd actually taken this patch to completion? It feels like all I've done is move complexity out of disasm.h and forced the complexity onto each user of the disassembler, which isn't great. The current approach might not be great, but actually using the disassembler is pretty straight forward. The other problem with the above is knowing what state to keep in e.g. gdbpy_disassembler and what to move into gdbpy_disassemble_info? I played around with this a bit, but to some degree I felt like I was forcing myself to keep gdbpy_disassembler around, it almost felt more natural to move everything into gdbpy_disassemble_info, in which case, aren't I almost back where I started? Except I've moved the code out of disasm.h and to the site of each user? I know you said you'd be willing to see my original patch go in and refactor later - and I am of course happy to see the code reworked later - but I wanted to see if I could get something that you'd be happy with. Anyway, if you have any suggestions I willing to take another pass at this code. > >> +gdb_disassemble_info::gdb_disassemble_info >> + (struct gdbarch *gdbarch, struct ui_file *stream, >> + read_memory_ftype read_memory_func, memory_error_ftype memory_error_func, >> + print_address_ftype print_address_func, fprintf_ftype fprintf_func, >> + fprintf_styled_ftype fprintf_styled_func) >> + : m_gdbarch (gdbarch) >> { >> - init_disassemble_info (&m_di, &m_buffer, dis_asm_fprintf, >> - dis_asm_styled_fprintf); >> + gdb_assert (fprintf_func != nullptr); >> + gdb_assert (fprintf_styled_func != nullptr); >> + init_disassemble_info (&m_di, stream, fprintf_func, >> + fprintf_styled_func); >> m_di.flavour = bfd_target_unknown_flavour; >> - m_di.memory_error_func = dis_asm_memory_error; >> - m_di.print_address_func = dis_asm_print_address; >> - /* NOTE: cagney/2003-04-28: The original code, from the old Insight >> - disassembler had a local optimization here. By default it would >> - access the executable file, instead of the target memory (there >> - was a growing list of exceptions though). Unfortunately, the >> - heuristic was flawed. Commands like "disassemble &variable" >> - didn't work as they relied on the access going to the target. >> - Further, it has been superseeded by trust-read-only-sections >> - (although that should be superseeded by target_trust..._p()). */ >> - m_di.read_memory_func = read_memory_func; >> + if (memory_error_func != nullptr) >> + m_di.memory_error_func = memory_error_func; >> + if (print_address_func != nullptr) >> + m_di.print_address_func = print_address_func; >> + if (read_memory_func != nullptr) >> + m_di.read_memory_func = read_memory_func; > > Are the nullptr checks needed? Are these fields initially nullptr, or > they have some other value? Yes they are needed. The three fields protected here are given non-null values by the call to init_disassemble_info, we only want to replace these defaults with a user supplied value if the user supplied value is not nullptr. Placing nullptr into these fields will cause the disassembler to crash! This was documented in the header file on gdb_disassemble_info, but I've added an extra comment in the code here to make it clear what's going on. > > >> +struct gdb_disassemble_info >> +{ >> + DISABLE_COPY_AND_ASSIGN (gdb_disassemble_info); >> >> - /* Return the gdbarch of gdb_disassembler. */ >> + /* Return the gdbarch we are disassembing for. */ > > disassembing -> disassembling Fixed. I've working on updating patch #3 and will repost this series once that's done. Thanks, Andrew