From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-110.mimecast.com (us-smtp-delivery-110.mimecast.com [170.10.133.110]) by sourceware.org (Postfix) with ESMTP id 468813857006 for ; Sun, 3 Oct 2021 05:05:37 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 468813857006 Received: from mail-pj1-f70.google.com (mail-pj1-f70.google.com [209.85.216.70]) (Using TLS) by relay.mimecast.com with ESMTP id us-mta-484-Ulxf4jvHNDuvdgZlq2I9Nw-1; Sun, 03 Oct 2021 01:05:35 -0400 X-MC-Unique: Ulxf4jvHNDuvdgZlq2I9Nw-1 Received: by mail-pj1-f70.google.com with SMTP id fh15-20020a17090b034f00b0019f68a33966so5306387pjb.1 for ; Sat, 02 Oct 2021 22:05:35 -0700 (PDT) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20210112; h=x-gm-message-state:mime-version:references:in-reply-to:from:date :message-id:subject:to:cc; bh=IJJQMD+5u2qe3Rs1JzahzHVp7YiGGlAsFI8qwg4dCkQ=; b=7ZneaC9ONlo+o8NhRde187HoEzkN1HmT5Co5cGUtrxedPbaeScGpLJWSyQsNt8M38a JLXYBrqgRClWlkv2mOh73XTySggfvlTfXvZXbHYwk05ioCtqcXLcdKSQPu1P2tGPV+3Z FMTvLhmM54xyQzR5lO2NI1s3hAhTTJNTomdh+AYTl8+QmdBIGiuUgqrgyuUfEHi+RJda o93hwXHtMo6XN8l36c+GVWEf3H/MIaJajDj/+o2zbRIQSotYQi59TDrjBqbf1kuP1yOu X2A/HVVFtYDyYtCfRvZRW478vGrbMvk127fR/4MXMZEL3iMVIxvIy/6yyIAdAe/fut8I TrAA== X-Gm-Message-State: AOAM5333c/UbD0b+XyYuUyHf2/E2MvcpwTajBbtNkmnixwV0r7HBPYXZ gIfGQCojWXa7UTWrESujBAfIES8O3b8NR97vMjKj0OGMZxa6V2cVAdn4bjX1j/HQA5XBSTo82jC c29z6oQhHjyiTw9/nB98Xj7bDAE5qCI8XKW625mZV X-Received: by 2002:a17:90a:191a:: with SMTP id 26mr29570088pjg.118.1633237534024; Sat, 02 Oct 2021 22:05:34 -0700 (PDT) X-Google-Smtp-Source: ABdhPJzZDOFF4n315L93yEhI1MixK95anhfbPaNWNw+Q4bG26C1oV2FoazvAiq9SPUC/xVCkrX/GRSuoznJtSNVQFKY= X-Received: by 2002:a17:90a:191a:: with SMTP id 26mr29570058pjg.118.1633237533515; Sat, 02 Oct 2021 22:05:33 -0700 (PDT) MIME-Version: 1.0 References: In-Reply-To: From: KJ Tsanaktsidis Date: Sun, 3 Oct 2021 16:05:22 +1100 Message-ID: Subject: Re: dwarf_aggregate_size doesn't work with arrays in partial CUs To: Mark Wielaard Cc: elfutils-devel@sourceware.org X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: zendesk.com Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.4 X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on server2.sourceware.org X-BeenThere: elfutils-devel@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Elfutils-devel mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Sun, 03 Oct 2021 05:05:38 -0000 On Thu, Sep 30, 2021 at 12:27 AM Mark Wielaard wrote: > > Hi KJ, > > On Sat, 2021-09-25 at 17:21 +1000, KJ Tsanaktsidis via Elfutils-devel > wrote: > > I'm writing a program that uses ptrace to poke at internal OpenSSL > > data structures for another process. I'm using libdw to parse the > > DWARF data for the copy of OpenSSL actually linked in to the target > > process, so I can extract struct offsets, member sizes and the like > > and poke at the right places. > > > > I've run into an issue where dwarf_aggregate_size can't calculate the > > size of an array, when the array is included in a partial CU > > (DW_TAG_partial_unit). If the array unit includes a DW_AT_upper_bound > > attribute, but not a DW_AT_lower_bound attribute, then > > dwarf_aggregate_size will infer the lower bound based on the > > DW_AT_language attribute of the enclisng CU (i.e. whether the language > > uses zero or one based indexing). > > > > However, the debug symbols I'm looking at for OpenSSL from the Ubuntu > > repositories have the DW_AT_language on the full compilation unit > > entries, but not in the partial ones included in them. This means that > > caling dwarf_aggregate_size on the array type DIE does not work. > > That is indeed a problem, since dwarf_aggregate_size doesn't provide > another way to provide the language to use for the > dwarf_default_lower_bound call. And the default is to return an > DWARF_E_UNKNOWN_LANGUAGE error. > > Maybe we should change the default to assume the lower bound is zero? > > > The DWARF spec doesn't really seem to have anything to say on the > > matter (all it says is "A full or partial compilation unit entry may > > have the following attributes", but doesn't say what it logically > > means if an attribute is present on the complete CU but not a partial > > one). > > I think it is assumed that it inherits those attributes from the CU > from which the partial one was imported and/or from the CU of the DIE > that referenced the DIE in the partial unit. But I don't think it is > easy to track that with libdw currently. > > > I guess it doesn't really make sense for a single compilation unit to > > contain multiple languages? So I wonder if dwarf_srclang (called by > > dwarf_aggregate_size) should crawl through the list of CU's to see if > > the DIE's CU is included in a CU that _does_ specify DW_AT_language > > (recursively, I suppose). Then, we can infer that the partial CU's > > language is the same as the enclosing one. > > > > If people reckon this is a good idea (or, have a better one!), I'm > > happy to try and put together a patch. > > I think that suggestion is sound, but really expensive. It also is > somewhat tricky if you have alt files, you'll have to track back to the > original Dwarf to see if it imports one of the partial units from the > alt file. > > But I also don't have a good alternative idea. We could maybe have a > variant of dwarf_aggregate_size that takes a language default value, > but that doesn't seem like a very generic solution. Or maybe a variant > of dwarf_srclang that takes any DIE (not just a CU DIE) and which tries > to figure out the best language to use, which falls back to some > default value if it cannot figure out what the language is that can be > used with dwarf_default_lower_bound to get a default (most likely > zero)? > > We could also ask producers (like dwz) to always include a > DW_AT_language for partial units they create. But that of course makes > the partial units bigger (and at least dwz creates them to make the > full debuginfo smaller). > > Cheers, > > Mark > I guess we don't want to hide some really expensive traversal operation inside a simple call to dwarf_aggregate_size, no... What if we instead provide a way for the user to specify what language a CU is? Like "dwarf_cu_report_language(Dwarf_Die *cu, int lang)". That would get saved with the (partial) CU, and dwarf_srclang could retrieve this information (if DW_AT_language isn't set). Then, the user could recursively traverse all CUs and call dwarf_cu_report_language on each partial CU. And as a bonus, we could even wrap that up in dwarf_cu_traverse_partial_cu_set_language or something (OK, the name needs a bit of workshopping). That way, the expensive thing is in a separate call that's marked as being very expensive (and cached, so it only needs to be done once). Sound like a reasonable approach?