From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <ktsanaktsidis@zendesk.com>
Received: from us-smtp-delivery-110.mimecast.com
 (us-smtp-delivery-110.mimecast.com [170.10.133.110])
 by sourceware.org (Postfix) with ESMTP id 468813857006
 for <elfutils-devel@sourceware.org>; Sun,  3 Oct 2021 05:05:37 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.4.1 sourceware.org 468813857006
Received: from mail-pj1-f70.google.com (mail-pj1-f70.google.com
 [209.85.216.70]) (Using TLS) by relay.mimecast.com with ESMTP id
 us-mta-484-Ulxf4jvHNDuvdgZlq2I9Nw-1; Sun, 03 Oct 2021 01:05:35 -0400
X-MC-Unique: Ulxf4jvHNDuvdgZlq2I9Nw-1
Received: by mail-pj1-f70.google.com with SMTP id
 fh15-20020a17090b034f00b0019f68a33966so5306387pjb.1
 for <elfutils-devel@sourceware.org>; Sat, 02 Oct 2021 22:05:35 -0700 (PDT)
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
 d=1e100.net; s=20210112;
 h=x-gm-message-state:mime-version:references:in-reply-to:from:date
 :message-id:subject:to:cc;
 bh=IJJQMD+5u2qe3Rs1JzahzHVp7YiGGlAsFI8qwg4dCkQ=;
 b=7ZneaC9ONlo+o8NhRde187HoEzkN1HmT5Co5cGUtrxedPbaeScGpLJWSyQsNt8M38a
 JLXYBrqgRClWlkv2mOh73XTySggfvlTfXvZXbHYwk05ioCtqcXLcdKSQPu1P2tGPV+3Z
 FMTvLhmM54xyQzR5lO2NI1s3hAhTTJNTomdh+AYTl8+QmdBIGiuUgqrgyuUfEHi+RJda
 o93hwXHtMo6XN8l36c+GVWEf3H/MIaJajDj/+o2zbRIQSotYQi59TDrjBqbf1kuP1yOu
 X2A/HVVFtYDyYtCfRvZRW478vGrbMvk127fR/4MXMZEL3iMVIxvIy/6yyIAdAe/fut8I
 TrAA==
X-Gm-Message-State: AOAM5333c/UbD0b+XyYuUyHf2/E2MvcpwTajBbtNkmnixwV0r7HBPYXZ
 gIfGQCojWXa7UTWrESujBAfIES8O3b8NR97vMjKj0OGMZxa6V2cVAdn4bjX1j/HQA5XBSTo82jC
 c29z6oQhHjyiTw9/nB98Xj7bDAE5qCI8XKW625mZV
X-Received: by 2002:a17:90a:191a:: with SMTP id
 26mr29570088pjg.118.1633237534024; 
 Sat, 02 Oct 2021 22:05:34 -0700 (PDT)
X-Google-Smtp-Source: ABdhPJzZDOFF4n315L93yEhI1MixK95anhfbPaNWNw+Q4bG26C1oV2FoazvAiq9SPUC/xVCkrX/GRSuoznJtSNVQFKY=
X-Received: by 2002:a17:90a:191a:: with SMTP id
 26mr29570058pjg.118.1633237533515; 
 Sat, 02 Oct 2021 22:05:33 -0700 (PDT)
MIME-Version: 1.0
References: <CAJ7wOOvKDx2TakFm2dA82DmjsyCETuz0gKAR6taorx5eHArTBA@mail.gmail.com>
 <afebae258ba067f19c025661babc6c341efc49b5.camel@klomp.org>
In-Reply-To: <afebae258ba067f19c025661babc6c341efc49b5.camel@klomp.org>
From: KJ Tsanaktsidis <ktsanaktsidis@zendesk.com>
Date: Sun, 3 Oct 2021 16:05:22 +1100
Message-ID: <CAJ7wOOu7Pmf6FoFmAYQX=tBqxxCdnx3+pjtx2=2-VQSzp9gc-w@mail.gmail.com>
Subject: Re: dwarf_aggregate_size doesn't work with arrays in partial CUs
To: Mark Wielaard <mark@klomp.org>
Cc: elfutils-devel@sourceware.org
X-Mimecast-Spam-Score: 0
X-Mimecast-Originator: zendesk.com
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: quoted-printable
X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, DKIMWL_WL_HIGH,
 DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW,
 RCVD_IN_MSPIKE_H3, RCVD_IN_MSPIKE_WL, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=ham autolearn_force=no version=3.4.4
X-Spam-Checker-Version: SpamAssassin 3.4.4 (2020-01-24) on
 server2.sourceware.org
X-BeenThere: elfutils-devel@sourceware.org
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Elfutils-devel mailing list <elfutils-devel.sourceware.org>
List-Unsubscribe: <https://sourceware.org/mailman/options/elfutils-devel>,
 <mailto:elfutils-devel-request@sourceware.org?subject=unsubscribe>
List-Archive: <https://sourceware.org/pipermail/elfutils-devel/>
List-Help: <mailto:elfutils-devel-request@sourceware.org?subject=help>
List-Subscribe: <https://sourceware.org/mailman/listinfo/elfutils-devel>,
 <mailto:elfutils-devel-request@sourceware.org?subject=subscribe>
X-List-Received-Date: Sun, 03 Oct 2021 05:05:38 -0000

On Thu, Sep 30, 2021 at 12:27 AM Mark Wielaard <mark@klomp.org> wrote:
>
> Hi KJ,
>
> On Sat, 2021-09-25 at 17:21 +1000, KJ Tsanaktsidis via Elfutils-devel
> wrote:
> > I'm writing a program that uses ptrace to poke at internal OpenSSL
> > data structures for another process. I'm using libdw to parse the
> > DWARF data for the copy of OpenSSL actually linked in to the target
> > process, so I can extract struct offsets, member sizes and the like
> > and poke at the right places.
> >
> > I've run into an issue where dwarf_aggregate_size can't calculate the
> > size of an array, when the array is included in a partial CU
> > (DW_TAG_partial_unit). If the array unit includes a DW_AT_upper_bound
> > attribute, but not a DW_AT_lower_bound attribute, then
> > dwarf_aggregate_size will infer the lower bound based on the
> > DW_AT_language attribute of the enclisng CU (i.e. whether the language
> > uses zero or one based indexing).
> >
> > However, the debug symbols I'm looking at for OpenSSL from the Ubuntu
> > repositories have the DW_AT_language on the full compilation unit
> > entries, but not in the partial ones included in them. This means that
> > caling dwarf_aggregate_size on the array type DIE does not work.
>
> That is indeed a problem, since dwarf_aggregate_size doesn't provide
> another way to provide the language to use for the
> dwarf_default_lower_bound call. And the default is to return an
> DWARF_E_UNKNOWN_LANGUAGE error.
>
> Maybe we should change the default to assume the lower bound is zero?
>
> > The DWARF spec doesn't really seem to have anything to say on the
> > matter (all it says is "A full or partial compilation unit entry may
> > have the following attributes", but doesn't say what it logically
> > means if an attribute is present on the complete CU but not a partial
> > one).
>
> I think it is assumed that it inherits those attributes from the CU
> from which the partial one was imported and/or from the CU of the DIE
> that referenced the DIE in the partial unit. But I don't think it is
> easy to track that with libdw currently.
>
> > I guess it doesn't really make sense for a single compilation unit to
> > contain multiple languages? So I wonder if dwarf_srclang (called by
> > dwarf_aggregate_size) should crawl through the list of CU's to see if
> > the DIE's CU is included in a CU that _does_ specify DW_AT_language
> > (recursively, I suppose). Then, we can infer that the partial CU's
> > language is the same as the enclosing one.
> >
> > If people reckon this is a good idea (or, have a better one!), I'm
> > happy to try and put together a patch.
>
> I think that suggestion is sound, but really expensive. It also is
> somewhat tricky if you have alt files, you'll have to track back to the
> original Dwarf to see if it imports one of the partial units from the
> alt file.
>
> But I also don't have a good alternative idea. We could maybe have a
> variant of dwarf_aggregate_size that takes a language default value,
> but that doesn't seem like a very generic solution. Or maybe a variant
> of dwarf_srclang that takes any DIE (not just a CU DIE) and which tries
> to figure out the best language to use, which falls back to some
> default value if it cannot figure out what the language is that can be
> used with dwarf_default_lower_bound to get a default (most likely
> zero)?
>
> We could also ask producers (like dwz) to always include a
> DW_AT_language for partial units they create. But that of course makes
> the partial units bigger (and at least dwz creates them to make the
> full debuginfo smaller).
>
> Cheers,
>
> Mark
>

I guess we don't want to hide some really expensive traversal
operation inside a simple call to dwarf_aggregate_size, no...

What if we instead provide a way for the user to specify what language
a CU is? Like "dwarf_cu_report_language(Dwarf_Die *cu, int lang)".
That would get saved with the (partial) CU, and dwarf_srclang could
retrieve this information (if DW_AT_language isn't set). Then, the
user could recursively traverse all CUs and call
dwarf_cu_report_language on each partial CU. And as a bonus, we could
even wrap that up in dwarf_cu_traverse_partial_cu_set_language or
something (OK, the name needs a bit of workshopping).

That way, the expensive thing is in a separate call that's marked as
being very expensive (and cached, so it only needs to be done once).
Sound like a reasonable approach?