From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from mail-oi1-x22e.google.com (mail-oi1-x22e.google.com [IPv6:2607:f8b0:4864:20::22e]) by sourceware.org (Postfix) with ESMTPS id E86503858D3C for ; Tue, 7 Nov 2023 06:45:11 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org E86503858D3C Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=osandov.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=osandov.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org E86503858D3C Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=2607:f8b0:4864:20::22e ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699339514; cv=none; b=irObjdxo9MVkUZgyHmDL8coY316E0TfCVdEqcmKIWy73zSjQDQGe3dqRbv9dK5Ue/nno/q5dNKIArH53S/m5bkKd44x90fON499TbijbKXcdETqbeJEoWDaR7BD1oqNUCCM+a4ACikzkjNuoDvq79CJXcyayDUBvtDhfYjRwRlg= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1699339514; c=relaxed/simple; bh=tefhzf+lKMEFCm19dIKne2dGz8wpgz6FRq258kzpoKw=; h=DKIM-Signature:Date:From:To:Subject:Message-ID:MIME-Version; b=IgbM9r1rZjokzD8JcELhbUdeuaqZ4V56MJYPghtgTYaeSJ3XX7XReeZj6sD3bO+dcVrm+qSoAKMeoLz0u3tUzJITpsa560iWJ8d1OzfjG9d9OnUs3QYnOZAdflYjIOkM9+1NlvCzGyIaOu8+Ti0T/tu0YoSL7C9vNJRm22K+4KQ= ARC-Authentication-Results: i=1; server2.sourceware.org Received: by mail-oi1-x22e.google.com with SMTP id 5614622812f47-3b2ea7cc821so3399402b6e.1 for ; Mon, 06 Nov 2023 22:45:11 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=osandov-com.20230601.gappssmtp.com; s=20230601; t=1699339511; x=1699944311; darn=sourceware.org; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:from:to:cc:subject:date:message-id:reply-to; bh=4Fgsq/8sIhsxyqzfU08Wzq/k1aj+bfOanonpwlmABx0=; b=hQd+YGaK82FxpWr/lT94wzfeacKllnl/9gkvsFCPQ9t9+fffj5Oy46GXTvwZdFP1Tr d1iCRkruYQryeBFepHuqeDLdpM8lgQRS6VubCv2S7RoyZgxW98HyNvD0ZCrt3zZ1uW1M E6vtlPkPEp8qbvTmZO4nYjrDIQgljR5fRuuICPC55pri7a5RY1iNx6EyD3HOtQNEJsx0 b7AQHiJ3cagR3tPdcxLnbVSfgaXH9gnjrlZfpI0r1uEwiYiD+LLrHknHPoSl8mj5/noe 9VKIwivL4obu6aY0iikkphGN+LhIGqvM6vbDOxrKO805zWqGl4cwecYvAFts2HwuoTP0 tenw== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1699339511; x=1699944311; h=in-reply-to:content-disposition:mime-version:references:message-id :subject:cc:to:from:date:x-gm-message-state:from:to:cc:subject:date :message-id:reply-to; bh=4Fgsq/8sIhsxyqzfU08Wzq/k1aj+bfOanonpwlmABx0=; b=JMCKJgEXOTFcf2dT7Rqi07m6uqKXpq0R7Vm29qhV3WH/yb46A/831pYkLEsKLhlxUo bq7JWvHtQ74YPYx4xHIcVR+AbQFhqiHrll7n90kLPVWY2D+M+6hB7eDb/65M4IxbPlzS bUFa391+E7ir47ZSI9w+Pr/uNvrynAkIwG2dHJLJLiVjqw4Ifxd2plNX9l7ZCVByT4t0 EWtIHCEwllnqmHxjk1u606uITMtKHNqOtagrcTAjcvK9zjjaVZoIWMbTzJpA32YFd6qU 7ouQDSEQb66HyPoBDhRenu4jT802Spt//QFulL9GHlC0rkvhqX9eeGEHCTJN6ajmv/qB xNyA== X-Gm-Message-State: AOJu0Yyql5Wq33xr2FDhlZgknlvIDmwmyXHpFEhJXwBZAwr1ckgJBXEr r3WXuCIwRBlkl/6Tul4+Pj/lkm7smqubzbpQKcI= X-Google-Smtp-Source: AGHT+IHFtvdOVYMQT+ewqDkg2EHx0Ohf/PAqaKoUaXRRIO6bs8/QpkOpvn4G7sXYn58bk4Oc3ronxw== X-Received: by 2002:a05:6808:1889:b0:3a9:bb4f:9efd with SMTP id bi9-20020a056808188900b003a9bb4f9efdmr40029606oib.29.1699339511050; Mon, 06 Nov 2023 22:45:11 -0800 (PST) Received: from telecaster ([2620:10d:c090:400::4:cb49]) by smtp.gmail.com with ESMTPSA id t62-20020a628141000000b0068fe39e6a46sm6851879pfd.112.2023.11.06.22.45.09 (version=TLS1_3 cipher=TLS_AES_256_GCM_SHA384 bits=256/256); Mon, 06 Nov 2023 22:45:10 -0800 (PST) Date: Mon, 6 Nov 2023 22:45:08 -0800 From: Omar Sandoval To: Mark Wielaard Cc: elfutils-devel@sourceware.org Subject: Re: [PATCH 08/14] libdw: Parse DWARF package file index sections Message-ID: References: <20231101230704.GN8429@gnu.wildebeest.org> MIME-Version: 1.0 Content-Type: text/plain; charset=us-ascii Content-Disposition: inline In-Reply-To: <20231101230704.GN8429@gnu.wildebeest.org> X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00,DKIM_SIGNED,DKIM_VALID,RCVD_IN_DNSWL_NONE,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: On Thu, Nov 02, 2023 at 12:07:04AM +0100, Mark Wielaard wrote: > Hi Omar, > > On Wed, Sep 27, 2023 at 11:20:57AM -0700, Omar Sandoval wrote: > > The .debug_cu_index and .debug_tu_index sections in DWARF package files > > are basically hash tables mapping a unit's 8 byte signature to an offset > > and size in each section used by that unit [1]. Add support for parsing > > and doing lookups in the index sections. > > > > We look up a unit in the index when we intern it and cache its hash > > table row in Dwarf_CU. > > This looks good. Thanks for the various testcases. > > Do I understand correctly that binutils dwp only does the DWARF4 GNU > extension and llvm-dwp only does the DWARF5 standardized format? Maybe > we should create a eu-dwp that does both. llvm-dwp can do both the DWARF4 GNU format and the DWARF5 format. binutils dwp can only do DWARF4. > It looks like you are very careful checking boundaries, but it would > be bad to do some fuzzing on this. > > > Then, a new function, dwarf_cu_dwp_section_info, > > can be used to look up the section offsets and sizes for a unit. This > > will mostly be used internally in libdw, but it will also be needed in > > static inline functions shared with eu-readelf. Additionally, making it > > public it makes dwp support much easier for external tools that do their > > own low-level parsing of DWARF information, like drgn [2]. > > Although I am not against this new interface, I am not super > enthousiastic about it. > > There is one odd part, DW_SECT_TYPES should be defined in dwarf.h with > the other DW_SECT constants, otherwise it cannot be used (unless you > define it yourself to 2). Yeah, I wasn't sure about adding it or not since it's not technically defined in the DWARF5 standard. But if we can count on future DWARF standards not reusing the value 2 (which we probably can), I agree that it seems better to add it to dwarf.h. > For eu-readelf we don't really need it being public, it already cheats > a little and uses some (non-public) libdwP.h functions. That is > actually not great either. So if there is a public function that is > available that is actually preferred. But if it is just for > eu-readelf, I don't think it is really needed. And it seems you > haven't actually added the support to eu-readelf to print these > offsets. Right, eu-readelf only uses it indirectly via the static inline functions modified in patch 13. It looks like eu-readelf dynamically links to libdw (on Fedora, at least), so either some part of the dwp implementation needs to be exported, or a big chunk of it needs to be inlined in libdwP.h, right? Either way, I'd still love to have this interface for drgn. > It is fine to expose these offsets and sizes, but how exactly are you > using them in drgn? It seems we don't have any other interfaces in > libdw that you can then use these with. Here's the branch I linked to in my cover letter: https://github.com/osandov/drgn/tree/dwp. I need this interface for the couple of places where drgn parses DWARF itself. One of those places is in the global name indexing step drgn does at startup. We do this by enumerating all CUs using libdw, then using our own purpose-built DWARF parser to do a fast, parallelized scan. Our bespoke parser needs to know the base of the abbreviation table and the string offsets table, which is easy without dwp. This interface also makes it easy with dwp. Without this interface, I'd need to reimplement the CU index parser in drgn :( > Can we split off this public interface from the rest of this patch? > But then we also need to split off the tests. So maybe keep them together? Yeah, I included the interface in this patch exactly so that I could test the implementation in the same patch. I'm happy to split that up however you'd prefer. [snip] > > + Dwarf_Package_Index *index = malloc (sizeof (*index)); > > + if (index == NULL) > > + { > > + __libdw_seterrno (DWARF_E_NOMEM); > > + return NULL; > > + } > > + > > + index->dbg = dbg; > > + /* Set absent sections to UINT32_MAX. */ > > + memset (index->sections, 0xff, sizeof (index->sections)); > > OK, although I would have preferred a simple for loop and let the > compiler optimize it. Ok, I can fix that. [snip] > > +static int > > +__libdw_dwp_section_info (Dwarf_Package_Index *index, uint32_t unit_row, > > + unsigned int section, Dwarf_Off *offsetp, > > + Dwarf_Off *sizep) > > +{ > > + if (index == NULL) > > + return -1; > > + if (unit_row == 0) > > + { > > + __libdw_seterrno (DWARF_E_INVALID_DWARF); > > + return -1; > > + } > > OK, but why isn't the caller checking this? Partially becase it simplified callers like __libdw_dwp_findcu_id to not require a separate check, and partially just to be defensive. [snip] Thanks for taking a look!