From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 98018 invoked by alias); 17 Oct 2019 17:36:34 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 97994 invoked by uid 89); 17 Oct 2019 17:36:33 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-10.2 required=5.0 tests=AWL,BAYES_00,SPF_HELO_PASS,UNPARSEABLE_RELAY autolearn=ham version=3.3.1 spammy=Emacs, hell, walking, compressor X-HELO: aserp2120.oracle.com Received: from aserp2120.oracle.com (HELO aserp2120.oracle.com) (141.146.126.78) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Thu, 17 Oct 2019 17:36:32 +0000 Received: from pps.filterd (aserp2120.oracle.com [127.0.0.1]) by aserp2120.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x9HHXu1H115611; Thu, 17 Oct 2019 17:36:30 GMT DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=oracle.com; h=from : to : cc : subject : references : date : in-reply-to : message-id : mime-version : content-type; s=corp-2019-08-05; bh=BkFSR9e3AzBu7G00NJnoYZsH2OieVgwZm6Zjkuy2j0c=; b=OJb/wnTYg/pP0vXb3KWlc8PExs2I5gYjf7flEo7kWxkGRIS3b/sRqVZ/hwSk9vnx36jV UoJA1w4EVWw98AAyb5Ohazf0ipg/C34Vnz2l9zDg4S2OkoXsxSqzL21f8RM4l8w/ffBc aKdq26C4DxuONjfHqf+RZUHV03SjcCExPjdFz4U9id2va6QJbLfke0mIfemb7syfTbLN gomb3OJx9dNh7Dv/zftqqcLtmSapXxASvw6qCtbXQm34oCplXLwH1J34Ps227WbcDYFa 6i9RbhU98MayuISuY6V+bs3NV3btPRSkGyu+8Fsd5D+oSjsyYQStEg/H8OHhBae6oLsB hg== Received: from userp3030.oracle.com (userp3030.oracle.com [156.151.31.80]) by aserp2120.oracle.com with ESMTP id 2vk6sqyy43-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 17 Oct 2019 17:36:30 +0000 Received: from pps.filterd (userp3030.oracle.com [127.0.0.1]) by userp3030.oracle.com (8.16.0.27/8.16.0.27) with SMTP id x9HHY9gG083672; Thu, 17 Oct 2019 17:36:29 GMT Received: from userv0122.oracle.com (userv0122.oracle.com [156.151.31.75]) by userp3030.oracle.com with ESMTP id 2vpvtm02ks-1 (version=TLSv1.2 cipher=ECDHE-RSA-AES256-GCM-SHA384 bits=256 verify=OK); Thu, 17 Oct 2019 17:36:29 +0000 Received: from abhmp0008.oracle.com (abhmp0008.oracle.com [141.146.116.14]) by userv0122.oracle.com (8.14.4/8.14.4) with ESMTP id x9HHaRWr026395; Thu, 17 Oct 2019 17:36:28 GMT Received: from loom (/81.187.191.129) by default (Oracle Beehive Gateway v4.0) with ESMTP ; Thu, 17 Oct 2019 17:36:27 +0000 From: Nick Alcock To: Indu Bhagat Cc: Jakub Jelinek , Richard Biener , Pedro Alves , gcc-patches , mark@klomp.org Subject: Re: Type representation in CTF and DWARF References: <6ade3f1a-08a5-f8df-c53a-c98d60b0f12a@oracle.com> <088f530c-959b-ce76-5b87-8889e82f62df@oracle.com> <20191009074933.GA15914@tucnak> <34013601-e93d-c236-5cee-0f39f3026e1e@oracle.com> Date: Thu, 17 Oct 2019 17:59:00 -0000 In-Reply-To: <34013601-e93d-c236-5cee-0f39f3026e1e@oracle.com> (Indu Bhagat's message of "Thu, 10 Oct 2019 16:05:53 -0700") Message-ID: <87r23b8eav.fsf@esperi.org.uk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-SW-Source: 2019-10/txt/msg01311.txt.bz2 On 11 Oct 2019, Indu Bhagat stated: > Compile with -g -gdwarf-like-ctf and use dwz -o (using > dwz compiled from the master branch) on the generated binaries: > > (coreutils-0.22) > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > ls 30616 | 1136 | 21098 | 26240 | 0.62 > pwd 10734 | 788 | 10433 | 13929 | 0.83 > groups 10706 | 811 | 10249 | 13378 | 0.80 > > (emacs-26.3) > .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4)) > emacs-26.3.1 674657 | 6402 | 273963 | 273910 | 0.33 A side note here: the sizes given above are uncompressed sizes, but in the real world CTF is almost always compressed: the threshold for compression is in theory customizable but at the moment is hardwired at 4KiB-uncompressed in the linker. I usually see compression ratios of roughly 3 or 4 to 1: e.g. I just tried it with a randomly chosen binary, /usr/lib/libgtk-3.so.0.2404.3, and got these sizes: .text: 3317489 DWARF: 8589254 Uncompressed CTF (*no* ELF strtab sharing, so a bit bigger than usual): 713264 .ctf section size: 213839 Note that this is not only in the absence of CTF strtab sharing with the ELF dynstrtab, but also using a less effective compressor: currently we use gzip, but I expect to transition to lzma iff available at binutils build time (which it usually is), perhaps as an option (on by default) to allow interoperability with binutils that don't have lzma available. Obviously better compressors will save even more space. It may help that CTF is designed for good compressibility: we try to minimize the number of unique symbols if we can do so without impairing other properties, e.g. by avoiding encoding IDs of objects when we can instead rely on the consumer to compute them at read time by walking through the relevant data structures and counting. A few benchamrks indicate that compression by default also saves time both at compression and decompression time. (Within a week I should be able to repeat this with an ld capable of CTF deduplication rather than kludging it with a deduplicator meant for a quite different job. I expect the sizes above to improve. In fact if they *don't* improve I will take this as strong evidence that my deduplicator is buggy.) FWIW, here's my Emacs (26.1.50) sizes, again with no strtab sharing, but with deduplication: it's bigger than I'd like at around 10% of .text size, but still much less than 1% of binary size (my goal is 1--2% of .text, but Emacs is a nice tricky case, like Gtk, with lots of big types and structures with long member names): section size addr .interp 28 4194872 .note.ABI-tag 32 4194900 .note.gnu.build-id 36 4194932 .gnu.hash 628 4194968 .dynsym 24432 4195600 .dynstr 16934 4220032 .gnu.version 2036 4236966 .gnu.version_r 704 4239008 .rela.data.rel.ro 72 4239712 .rela.data 168 4239784 .rela.got 48 4239952 .rela,bss 336 4240000 .rela.plt 23448 4240336 .init 23 4263784 .plt 15648 4263808 .text 1912622 4279456 .fini 9 6192080 .rodata 165416 6192096 .eh_frame_hdr 36196 6357512 .eh_frame 210976 6393712 .init_array 8 6609328 .fini_array 8 6609336 .data.rel.ro 4569 6609344 .dynamic 1104 6613920 .got 16 6615024 .got.plt 7840 6615040 .data 3276077 6622880 ,bss 34153472 9899008 .comment 26 0 .gnu_debuglink 24 0 .comment 26 0 .debug_aranges 1536 0 .debug_info 3912261 0 .debug_abbrev 38821 0 .debug_line 408063 0 .debug_str 117631 0 .debug_loc 954538 0 .debug_ranges 149590 0 .ctf 213839 0 .ctf (uncompressed) 713264 0 (obviously, manually edited a bit, size -A doesn't produce the last line on its own!) (I'm not sure what the hell is going on with the weirdly-named ,bss section. Probably something to do with unexec().)