From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from us-smtp-delivery-124.mimecast.com (us-smtp-delivery-124.mimecast.com [170.10.129.124]) by sourceware.org (Postfix) with ESMTPS id C47C43858D1E for ; Thu, 22 Feb 2024 03:19:25 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.4.2 sourceware.org C47C43858D1E Authentication-Results: sourceware.org; dmarc=pass (p=none dis=none) header.from=redhat.com Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=redhat.com ARC-Filter: OpenARC Filter v1.0.0 sourceware.org C47C43858D1E Authentication-Results: server2.sourceware.org; arc=none smtp.remote-ip=170.10.129.124 ARC-Seal: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708571968; cv=none; b=IToLMDzEdPeMew1rRz2ka+fcCbGj9mqgGksqPFctQcTbDVeB+lZyJOtyUp9fVakSBiMyqDGrnOt1v3PWJglheVNaor0HKbTRBhZjuy6Aa/kwbqdsR1sF7B1JRk/R+S/+cI98aKtftiv9YkIUJaVrIxCJIXFG1TBXekhxwkIfJkM= ARC-Message-Signature: i=1; a=rsa-sha256; d=sourceware.org; s=key; t=1708571968; c=relaxed/simple; bh=2Ax/YRpxbn4UI1+lbQBQb15TEE4Pu95sqTLjIYyi85I=; h=DKIM-Signature:MIME-Version:From:Date:Message-ID:Subject:To; b=v2M0UDMugI3uNOodCQ9YoP6gLs6QPSQViPoscX/ghn3c5Y545FgSBGfcvGtt7o3zVAbdguKyGlHvOA4JC98bFdfkBuzqDV3Q67hNIuVpvMCHBC0AwxZjnsHiwepEU2kuvZsW/zon5nbts1YdUSsY7Cau8UhSLEovl7a9Da+u/Bo= ARC-Authentication-Results: i=1; server2.sourceware.org DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=redhat.com; s=mimecast20190719; t=1708571965; h=from:from:reply-to:subject:subject:date:date:message-id:message-id: to:to:cc:cc:mime-version:mime-version:content-type:content-type: content-transfer-encoding:content-transfer-encoding: in-reply-to:in-reply-to:references:references; bh=44vltbnbjXMfPZxLCYWh2DODaf0disbDcRXWmn6ft0o=; b=ekwdffIVWyAO4mYLbee4MIuOmg+F9GCePO89b3k8iO2x18RxmpQjqHvxizGs4NDa4KIs2w ymqamZi7axFPe9E8Uy1kBBELZKVhIXJvQc6lb+gSImaoouL4oh8/1/WIxFFyPptntgpcEu v0QjPjVgpsHq9/yau+mtvKE3JCD2IwY= Received: from mail-pj1-f71.google.com (mail-pj1-f71.google.com [209.85.216.71]) by relay.mimecast.com with ESMTP with STARTTLS (version=TLSv1.3, cipher=TLS_AES_256_GCM_SHA384) id us-mta-159-jArNY9XmNIW8EpzyIWWI7A-1; Wed, 21 Feb 2024 22:19:23 -0500 X-MC-Unique: jArNY9XmNIW8EpzyIWWI7A-1 Received: by mail-pj1-f71.google.com with SMTP id 98e67ed59e1d1-299d784224eso2249051a91.3 for ; Wed, 21 Feb 2024 19:19:23 -0800 (PST) X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20230601; t=1708571962; x=1709176762; h=content-transfer-encoding:cc:to:subject:message-id:date:from :in-reply-to:references:mime-version:x-gm-message-state:from:to:cc :subject:date:message-id:reply-to; bh=44vltbnbjXMfPZxLCYWh2DODaf0disbDcRXWmn6ft0o=; b=n1eYSXiaALfaTeEoiGdNfCofx/UT0a6I1vqMUCCyym/ojNq6Z1gUz8ZvpU0NVHBpyU RxF7xpBCBlUSQKtFdIay1ruBUPfJzK+GHtiKema8swnT9+qrS7Sil/TIFoYxEIqx/1ZI bNseLOj41UBC/H0EfXk6FC8qHuGdqoYPhtLBm8hC+I2o7uqbQHhm6wJnEEjDLvjOR8jT T/XkODksl5ViW3yywD3E+iwiYTImH2NSMCkRdZb4+W48cyjOdDqPCUCctTPbMx0ksCCI +HgsmyW21wHDmsIJOuuWEJAQiCTGLjSCM0gb6MliTFl4Fgh9KlKEnNmQsis8t2hn6XI2 VQKQ== X-Gm-Message-State: AOJu0YxBieYKW1IKrzYx70yt3/sIFsPhk0+6H5Cz+FBlNySAOcHKyR/N JjB8uwQXo48S3z8t6nItI8yW4rwTZP+Jf/bVFq41rlgEgXvaoupaj3TwAIkbhCRI/os/P+t0Us2 YN07MpDkOFyqKE470Ib8Mz0AF8hiH2EKzO0+4/iZ/A32WJTVEP4ECTWC6UDAGcVHcPvwnZdafjm H0BT04tJ4qpm58RU+DDXJWJzeJw/jYUsU6X8vueSVnnOjhAPs= X-Received: by 2002:a17:90b:1085:b0:299:3f45:bd57 with SMTP id gj5-20020a17090b108500b002993f45bd57mr11516806pjb.30.1708571961947; Wed, 21 Feb 2024 19:19:21 -0800 (PST) X-Google-Smtp-Source: AGHT+IGVnxQXRSrC4hYxv5dV3avjztKHbjbrjJnUhRnhB/Iy2trGIgM8e0t/ttGklw1CDYlTvqwEInEhptNx399leJQ= X-Received: by 2002:a17:90b:1085:b0:299:3f45:bd57 with SMTP id gj5-20020a17090b108500b002993f45bd57mr11516796pjb.30.1708571961479; Wed, 21 Feb 2024 19:19:21 -0800 (PST) MIME-Version: 1.0 References: <20231207013504.40300-3-amerey@redhat.com> <20231211231853.116254-1-amerey@redhat.com> <20240220222332.GB1666@gnu.wildebeest.org> In-Reply-To: <20240220222332.GB1666@gnu.wildebeest.org> From: Aaron Merey Date: Wed, 21 Feb 2024 22:19:10 -0500 Message-ID: Subject: Re: [PATCH v2] dwarf_getaranges: Build aranges list from CUs instead of .debug_aranges To: Mark Wielaard Cc: elfutils-devel@sourceware.org X-Mimecast-Spam-Score: 0 X-Mimecast-Originator: redhat.com Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Spam-Status: No, score=-5.5 required=5.0 tests=BAYES_00,DKIMWL_WL_HIGH,DKIM_SIGNED,DKIM_VALID,DKIM_VALID_AU,DKIM_VALID_EF,RCVD_IN_DNSWL_NONE,RCVD_IN_MSPIKE_H2,SPF_HELO_NONE,SPF_NONE,TXREP,T_SCC_BODY_TEXT_LINE autolearn=ham autolearn_force=no version=3.4.6 X-Spam-Checker-Version: SpamAssassin 3.4.6 (2021-04-09) on server2.sourceware.org List-Id: Hi Mark, On Tue, Feb 20, 2024 at 5:23=E2=80=AFPM Mark Wielaard wrot= e: > > > As for the number of aranges found, there is a difference for libxul.so= : > > 250435 with the patch compared to 254832 without. So 4397 fewer arange= s > > are found when using the new CU iteration method. I'll dig into this a= nd > > see if there is a problem or if it's just due to some redundancy in > > libxul's .debug_aranges. FWIW there was no change to the aranges count= s > > for the other modules searched during this eu-stack firefox corefile te= st. > > A quick way to see where the differences are is using > eu-readelf --debug-dump=3Ddecodedaranges before/after your patch. > > This is opposite to what I expected. I had expected there to be more, > instead of less ranges. The difference is less than 2%. But still > interesting to know what/why. > > Were there any differences in the backtraces? If not, then those > ranges might not actually have been mapping to code. The backtraces were identical. I took a closer look at this and the difference is due to clang including DW_AT_location addresses in .debug_aranges. This patch just looks for DW_AT_{high,low}_pc and DW_AT_ranges (via dwarf_ranges) when generating the aranges list, so these DW_AT_location addresses aren't included. According to David Blaikie [1], "GCC does not include globals in debug_aranges, so they probably can't be relied upon unfortunately (unfortunate that Clang pays the cost when it emits them, but consumers can't rely on them)". Since consumers already can't assume that .debug_aranges includes DW_AT_location addresses, I think it's reasonable for us to not include them when dynamically generating aranges. Especially since there could be a noticeable performance cost to doing so. A separate issue I saw is that some entries in .debug_aranges with a decoded start address of "000000000000000000" wouldn't always have matching entries in .debug_ranges. This resulted in some dynamic aranges not matching their corresponding .debug_aranges entry. For example, the following decoded arange is in libxul's .debug_aranges: start: 000000000000000000, length: 13, CU DIE offset: 95189496 In the .debug_info for this CU, there is a corresponding subprogram with low_pc 0 and high_pc 13. However in the .debug_ranges entry for this CU, there is no range starting at address 0 with size of at least 13. The closest range list entry is: range 1, 1 +0x0000000000000001 <__ehdr_start+0x1>.. +000000000000000000 When we dynamically generate aranges this results in the following decoded arange: start: 0x0000000000000001, length: 0, CU DIE offset: 95189496 So the start address is off by 1 and the length doesn't match the .debug_aranges entry. In some similar cases there wouldn't even be a corresponding "range 1, 1" entry in .debug_ranges. I'm not yet sure what's going on here. > > > > Might it be an idea to leave dwarf_getaranges as it is and introduce = a > > > new (internal) function to get "dynamic" ranges? It looks like what > > > programs (like eu-stack and eu-addr2line) really use is dwarf_addrdie > > > and dwfl_module_addrdie. These are currently build on dwarf_getarange= s, > > > but could maybe use a new interface? > > > > IMO this depends on what users expect from dwarf_getaranges. Do they > > want the exact contents of .debug_aranges (whether or not it's complete= ) > > or should dwarf_getaranges go beyond .debug_aranges to ensure the most > > complete results? > > > > The comment for dwarf_getaranges in libdw.h simply reads "Return list > > address ranges". Since there's no mention of .debug_aranges specifical= ly, > > I think it's fair if dwarf_getaranges does whatever it can to ensure > > comprehensive results. In which case dwarf_getaranges should probably > > dynamically generate aranges. > > You might be right that no user really cares. But as seen in the > eu-readelf code, it might also be that people expected it to map to > the ranges from .debug_aranges. > > So I would be happier if we just kept the dwarf_getaranges code as > is. And just change the code in dwarf_addrdie and dwfl_module_addrdie. > > We could then also introduce a new public function, dwarf_getdieranges > (?) that does the new thing. But it doesn't have to be public on the > first try as long as dwarf_addrdie and dwfl_module_addrdie work. (We > might want to change the interface of dwarf_getdieranges so it can be > "lazy" for example.) Ok this approach seems like the most flexible. Users can have both .debug_aranges and CU-based aranges plus we don't have to change the semantics of dwarf_getaranges. I'll submit a revised patch for this. Aaron [1] https://reviews.llvm.org/D123538#inline-1188522