From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 11656 invoked by alias); 8 Jul 2019 14:08:12 -0000 Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm Precedence: bulk List-Id: List-Archive: List-Post: List-Help: Sender: gcc-patches-owner@gcc.gnu.org Received: (qmail 11648 invoked by uid 89); 8 Jul 2019 14:08:12 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-3.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.1 spammy=grew, remains, applicable, replacing X-HELO: mail.esperi.org.uk Received: from icebox.esperi.org.uk (HELO mail.esperi.org.uk) (81.187.191.129) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 08 Jul 2019 14:08:10 +0000 Received: from loom (nix@sidle.srvr.nix [192.168.14.8]) by mail.esperi.org.uk (8.15.2/8.15.2) with ESMTP id x684a2dJ008454; Mon, 8 Jul 2019 05:36:02 +0100 From: Nix To: Jakub Jelinek Cc: Richard Biener , Indu Bhagat , Jeff Law , Indu Bhagat , GCC Patches Subject: Re: [PATCH,RFC,V3 0/5] Support for CTF in GCC References: <1561617445-9328-1-git-send-email-indu.bhagat@oracle.com> <0086f709-b8e5-fc73-1679-4a39e0f4e673@redhat.com> <755cd109-f02b-3ebd-762f-71ae570bf21a@oracle.com> <87k1cwwd37.fsf@esperi.org.uk> <20190705191108.GR815@tucnak> Date: Mon, 08 Jul 2019 14:16:00 -0000 In-Reply-To: <20190705191108.GR815@tucnak> (Jakub Jelinek's message of "Fri, 5 Jul 2019 21:11:08 +0200") Message-ID: <87ftngwrek.fsf@esperi.org.uk> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1.50 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain X-DCC--Metrics: loom 1481; Body=6 Fuz1=6 Fuz2=6 X-IsSubscribed: yes X-SW-Source: 2019-07/txt/msg00581.txt.bz2 On 5 Jul 2019, Jakub Jelinek outgrape: > On Fri, Jul 05, 2019 at 07:28:12PM +0100, Nix wrote: >> > What makes it superior to DWARF stripped down to the above feature set? >> >> Increased compactness. DWARF fundamentally trades off compactness in >> favour of its regular structure, which makes it easier to parse (but not >> easier to interpret) but very hard to make it much smaller than it is >> now. Where DWARF uses word-sized and larger entities for everything, CTF >> packs everything much more tightly -- and this is quite visible in the > > That is just not true, most of the data in DWARF are uleb128/sleb128 > encoded, or often even present just in the abbreviation table > (DW_FORM_flag_present, DW_FORM_implicit_const), word-sized is typically only > stuff that needs relocations (at assembly time and more importantly at link > time). Hm. I may have misread the spec. The fact remains that DWARF is (in typical usage) both large and slow to use: it is not entirely untrue to say that you can spot a DWARF consumer because it takes ten seconds to start up. This may be something that can be avoided with sufficiently clever implementations, but I've never seen any such implementation and we don't appear to be approaching one terribly fast :( meanwhile, in CTF we already have a working system that can reduce multigigabyte DWARF input down to 6MiB of compressed CTF loading in fractions of a second, though it is true that not all of that input was global-scope type info, so a large portion of that multigigabyte input would simply have been dropped and should not be considered relevant. I'm not sure how to determine how much of the input is type DIEs at global scope... (The 6MiB figure is slightly misleading, too, since only 1439845 bytes of that is type data: the rest is mostly compressed string table.) Possibly sufficiently clever deduplication can do a similar scrunching job for DWARF, but I note that what DWARF deduplication GCC did in earlier releases has subsequently been removed because it never really worked very well. (Having written code that deduplicates DWARF, I can see why: it's a complex job when you just have C to think about. Doing it for C++ as well must have made people's brains dribble out of their ears). Type signatures in DWARF 4 were supposed to provide this sort of thing, too, but yet again the promise does not seem to have been borne out: DWARF debuginfo remains immense and there is no discussion of leaving unstripped binaries on production systems for the sake of continuous tracing tools or introspection, because the debuginfo in those binaries would still be many times the size of the binaries they relate to, and obviously leaving it unstripped in that case is ridiculous. Meanwhile, FreeBSD has a leg-up in continuous debugging because they generate (an older form of) CTF for everything and deduplicate it, and it's small enough that they can leave it linked into the binaries rather than stripping it out, and tracers can and do use it. I'm trying to give us all that advantage, while not leaving us tied to a format with as many limitations as FreeBSD's CTF. As a side note, I tried switching to ULEB128 for the representations of unsigned integers in CTF a while back, but never even pushed it anywhere because while it shrank the output a little, the compressed sizes worsened noticeably, by about 10%, and we don't want to hurt the compressed sizes any more than we do the uncompressed ones. I found this quite annoying. So I'm not convinced that ULEB actually buys you much of anything once compressors get into the mix. Something similar happened when I tried to do clever things with string tables last month, sharing common string suffixes, slicing strtabs up on underscores and changes of case and replacing strings where beneficial with offset tables pointing into the sliced-up pieces: the uncompressed size shrank by about 50% and the compressed size grew by 20%... I found this *very* annoying. :) > For DWARF you also have various techniques at reducing size and optimizing > redundancies away, look e.g. at the dwz utility. ... interesting! I'll be looking through this and seeing if any of it is applicable to CTF as well, that's for sure.