From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <gcc-patches-return-504613-listarch-gcc-patches=gcc.gnu.org@gcc.gnu.org>
Received: (qmail 11656 invoked by alias); 8 Jul 2019 14:08:12 -0000
Mailing-List: contact gcc-patches-help@gcc.gnu.org; run by ezmlm
Precedence: bulk
List-Id: <gcc-patches.gcc.gnu.org>
List-Archive: <http://gcc.gnu.org/ml/gcc-patches/>
List-Post: <mailto:gcc-patches@gcc.gnu.org>
List-Help: <mailto:gcc-patches-help@gcc.gnu.org>
Sender: gcc-patches-owner@gcc.gnu.org
Received: (qmail 11648 invoked by uid 89); 8 Jul 2019 14:08:12 -0000
Authentication-Results: sourceware.org; auth=none
X-Spam-SWARE-Status: No, score=-3.1 required=5.0 tests=AWL,BAYES_00 autolearn=ham version=3.3.1 spammy=grew, remains, applicable, replacing
X-HELO: mail.esperi.org.uk
Received: from icebox.esperi.org.uk (HELO mail.esperi.org.uk) (81.187.191.129) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Mon, 08 Jul 2019 14:08:10 +0000
Received: from loom (nix@sidle.srvr.nix [192.168.14.8])	by mail.esperi.org.uk (8.15.2/8.15.2) with ESMTP id x684a2dJ008454;	Mon, 8 Jul 2019 05:36:02 +0100
From: Nix <nix@esperi.org.uk>
To: Jakub Jelinek <jakub@redhat.com>
Cc: Richard Biener <richard.guenther@gmail.com>,        Indu Bhagat <indu.bhagat@oracle.com>, Jeff Law <law@redhat.com>,        Indu Bhagat <ibhagatgnu@gmail.com>,        GCC Patches <gcc-patches@gcc.gnu.org>
Subject: Re: [PATCH,RFC,V3 0/5] Support for CTF in GCC
References: <1561617445-9328-1-git-send-email-indu.bhagat@oracle.com>	<CAJiQQZZP+gZPqSc3RwpP5Ybs_p1gTjrJf6AKtymg87gFFEpwPg@mail.gmail.com>	<0086f709-b8e5-fc73-1679-4a39e0f4e673@redhat.com>	<CAFiYyc3M7h64_S6eVRLnHzK0VPUttBRT9yeQYd8Q30eVzBrEgQ@mail.gmail.com>	<c6ab390f-bb01-b706-c408-494d8e22f1e0@oracle.com>	<CAFiYyc3EHPtGGrYEqPManhbkRmcETb9jV5wPUsMRVBitJOpz2g@mail.gmail.com>	<755cd109-f02b-3ebd-762f-71ae570bf21a@oracle.com>	<CAFiYyc2Yv32JAhG-3Vpp2yuO-9PZkjcqFGS5aqxWN-DAxsP0dw@mail.gmail.com>	<87k1cwwd37.fsf@esperi.org.uk> <20190705191108.GR815@tucnak>
Date: Mon, 08 Jul 2019 14:16:00 -0000
In-Reply-To: <20190705191108.GR815@tucnak> (Jakub Jelinek's message of "Fri, 5	Jul 2019 21:11:08 +0200")
Message-ID: <87ftngwrek.fsf@esperi.org.uk>
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/26.1.50 (gnu/linux)
MIME-Version: 1.0
Content-Type: text/plain
X-DCC--Metrics: loom 1481; Body=6 Fuz1=6 Fuz2=6
X-IsSubscribed: yes
X-SW-Source: 2019-07/txt/msg00581.txt.bz2

On 5 Jul 2019, Jakub Jelinek outgrape:

> On Fri, Jul 05, 2019 at 07:28:12PM +0100, Nix wrote:
>> > What makes it superior to DWARF stripped down to the above feature set?
>> 
>> Increased compactness. DWARF fundamentally trades off compactness in
>> favour of its regular structure, which makes it easier to parse (but not
>> easier to interpret) but very hard to make it much smaller than it is
>> now. Where DWARF uses word-sized and larger entities for everything, CTF
>> packs everything much more tightly -- and this is quite visible in the
>
> That is just not true, most of the data in DWARF are uleb128/sleb128
> encoded, or often even present just in the abbreviation table
> (DW_FORM_flag_present, DW_FORM_implicit_const), word-sized is typically only
> stuff that needs relocations (at assembly time and more importantly at link
> time).

Hm. I may have misread the spec.

The fact remains that DWARF is (in typical usage) both large and slow to
use: it is not entirely untrue to say that you can spot a DWARF consumer
because it takes ten seconds to start up. This may be something that can
be avoided with sufficiently clever implementations, but I've never seen
any such implementation and we don't appear to be approaching one
terribly fast :( meanwhile, in CTF we already have a working system that
can reduce multigigabyte DWARF input down to 6MiB of compressed CTF
loading in fractions of a second, though it is true that not all of that
input was global-scope type info, so a large portion of that
multigigabyte input would simply have been dropped and should not be
considered relevant. I'm not sure how to determine how much of the input
is type DIEs at global scope... (The 6MiB figure is slightly misleading,
too, since only 1439845 bytes of that is type data: the rest is mostly
compressed string table.)

Possibly sufficiently clever deduplication can do a similar scrunching
job for DWARF, but I note that what DWARF deduplication GCC did in
earlier releases has subsequently been removed because it never really
worked very well. (Having written code that deduplicates DWARF, I can
see why: it's a complex job when you just have C to think about. Doing
it for C++ as well must have made people's brains dribble out of their
ears).

Type signatures in DWARF 4 were supposed to provide this sort of thing,
too, but yet again the promise does not seem to have been borne out:
DWARF debuginfo remains immense and there is no discussion of leaving
unstripped binaries on production systems for the sake of continuous
tracing tools or introspection, because the debuginfo in those binaries
would still be many times the size of the binaries they relate to, and
obviously leaving it unstripped in that case is ridiculous. Meanwhile,
FreeBSD has a leg-up in continuous debugging because they generate (an
older form of) CTF for everything and deduplicate it, and it's small
enough that they can leave it linked into the binaries rather than
stripping it out, and tracers can and do use it. I'm trying to give us
all that advantage, while not leaving us tied to a format with as many
limitations as FreeBSD's CTF.


As a side note, I tried switching to ULEB128 for the representations of
unsigned integers in CTF a while back, but never even pushed it anywhere
because while it shrank the output a little, the compressed sizes
worsened noticeably, by about 10%, and we don't want to hurt the
compressed sizes any more than we do the uncompressed ones. I found this
quite annoying. So I'm not convinced that ULEB actually buys you
much of anything once compressors get into the mix.

Something similar happened when I tried to do clever things with string
tables last month, sharing common string suffixes, slicing strtabs up on
underscores and changes of case and replacing strings where beneficial
with offset tables pointing into the sliced-up pieces: the uncompressed
size shrank by about 50% and the compressed size grew by 20%... I found
this *very* annoying. :)

> For DWARF you also have various techniques at reducing size and optimizing
> redundancies away, look e.g. at the dwz utility.

... interesting! I'll be looking through this and seeing if any of it is
applicable to CTF as well, that's for sure.