public inbox for gcc-patches@gcc.gnu.org
 help / color / mirror / Atom feed
From: Richard Biener <richard.guenther@gmail.com>
To: Nick Alcock <nick.alcock@oracle.com>
Cc: Indu Bhagat <indu.bhagat@oracle.com>,
	Jakub Jelinek <jakub@redhat.com>,
		Pedro Alves <palves@redhat.com>,
	gcc-patches <gcc-patches@gcc.gnu.org>,
	mark@klomp.org
Subject: Re: Type representation in CTF and DWARF
Date: Thu, 17 Oct 2019 18:09:00 -0000	[thread overview]
Message-ID: <CAFiYyc3ehvZzX6+ooDoiPtLTUHScjZrhkaDQ+giY4cvYyBO+_w@mail.gmail.com> (raw)
In-Reply-To: <87r23b8eav.fsf@esperi.org.uk>

On Thu, Oct 17, 2019 at 7:36 PM Nick Alcock <nick.alcock@oracle.com> wrote:
>
> On 11 Oct 2019, Indu Bhagat stated:
> > Compile with -g -gdwarf-like-ctf and use dwz -o <binary_dwz> <binary> (using
> > dwz compiled from the master branch) on the generated binaries:
> >
> > (coreutils-0.22)
> >      .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> > ls   30616           |    1136           |    21098       | 26240               | 0.62
> > pwd  10734           |    788            |    10433       | 13929               | 0.83
> > groups 10706         |    811            |    10249       | 13378               | 0.80
> >
> > (emacs-26.3)
> >      .debug_info(D1) | .debug_abbrev(D2) | .debug_str(D4) | .ctf (uncompressed) | ratio (.ctf/(D1+D2+0.5*D4))
> > emacs-26.3.1 674657  |    6402           |   273963       |   273910            | 0.33

Btw, for a fair comparison you have to remove all DW_TAG_subroutine
children as well since
CTF doesn't represent scopes or local variables at all (nor types only
used by locals).  It seems
CTF only represents function entry points.

> A side note here: the sizes given above are uncompressed sizes, but in
> the real world CTF is almost always compressed: the threshold for
> compression is in theory customizable but at the moment is hardwired at
> 4KiB-uncompressed in the linker. I usually see compression ratios of
> roughly 3 or 4 to 1: e.g. I just tried it with a randomly chosen binary,
> /usr/lib/libgtk-3.so.0.2404.3, and got these sizes:
>
> .text: 3317489
> DWARF: 8589254
> Uncompressed CTF (*no* ELF strtab sharing, so a bit bigger than usual): 713264
> .ctf section size: 213839
>
> Note that this is not only in the absence of CTF strtab sharing with the
> ELF dynstrtab, but also using a less effective compressor: currently we
> use gzip, but I expect to transition to lzma iff available at binutils
> build time (which it usually is), perhaps as an option (on by default)
> to allow interoperability with binutils that don't have lzma available.
> Obviously better compressors will save even more space.
>
> It may help that CTF is designed for good compressibility: we try to
> minimize the number of unique symbols if we can do so without impairing
> other properties, e.g. by avoiding encoding IDs of objects when we can
> instead rely on the consumer to compute them at read time by walking
> through the relevant data structures and counting.
>
> A few benchamrks indicate that compression by default also saves time
> both at compression and decompression time.
>
> (Within a week I should be able to repeat this with an ld capable of CTF
> deduplication rather than kludging it with a deduplicator meant for a
> quite different job. I expect the sizes above to improve. In fact if
> they *don't* improve I will take this as strong evidence that my
> deduplicator is buggy.)
>
>
> FWIW, here's my Emacs (26.1.50) sizes, again with no strtab sharing, but
> with deduplication: it's bigger than I'd like at around 10% of .text
> size, but still much less than 1% of binary size (my goal is 1--2% of
> .text, but Emacs is a nice tricky case, like Gtk, with lots of big types
> and structures with long member names):
>
> section                  size      addr
> .interp                    28   4194872
> .note.ABI-tag              32   4194900
> .note.gnu.build-id         36   4194932
> .gnu.hash                 628   4194968
> .dynsym                 24432   4195600
> .dynstr                 16934   4220032
> .gnu.version             2036   4236966
> .gnu.version_r            704   4239008
> .rela.data.rel.ro          72   4239712
> .rela.data                168   4239784
> .rela.got                  48   4239952
> .rela,bss                 336   4240000
> .rela.plt               23448   4240336
> .init                      23   4263784
> .plt                    15648   4263808
> .text                 1912622   4279456
> .fini                       9   6192080
> .rodata                165416   6192096
> .eh_frame_hdr           36196   6357512
> .eh_frame              210976   6393712
> .init_array                 8   6609328
> .fini_array                 8   6609336
> .data.rel.ro             4569   6609344
> .dynamic                 1104   6613920
> .got                       16   6615024
> .got.plt                 7840   6615040
> .data                 3276077   6622880
> ,bss                 34153472   9899008
> .comment                   26         0
> .gnu_debuglink             24         0
> .comment                   26         0
> .debug_aranges           1536         0
> .debug_info           3912261         0
> .debug_abbrev           38821         0
> .debug_line            408063         0
> .debug_str             117631         0
> .debug_loc             954538         0
> .debug_ranges          149590         0
> .ctf                   213839         0
> .ctf (uncompressed)    713264         0
>
> (obviously, manually edited a bit, size -A doesn't produce the last line
> on its own!)
>
> (I'm not sure what the hell is going on with the weirdly-named ,bss
> section. Probably something to do with unexec().)

  reply	other threads:[~2019-10-17 18:05 UTC|newest]

Thread overview: 24+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-10-04 19:12 Indu Bhagat
2019-10-07  7:35 ` Richard Biener
2019-10-07 20:47   ` Indu Bhagat
2019-10-07 20:56     ` Jason Merrill
2019-10-08 15:37 ` Pedro Alves
2019-10-09  6:04   ` Indu Bhagat
2019-10-09  7:43     ` Richard Biener
2019-10-09  8:01       ` Jakub Jelinek
2019-10-10 23:07         ` Indu Bhagat
2019-10-11 11:27           ` Richard Biener
2019-10-11 11:47             ` Jakub Jelinek
2019-10-25  3:43               ` Indu Bhagat
2019-10-25  7:49                 ` Richard Biener
2019-10-11 18:55             ` Indu Bhagat
2019-10-17 17:59           ` Nick Alcock
2019-10-17 18:09             ` Richard Biener [this message]
2019-10-17 19:12               ` Nick Alcock
2019-10-18 12:28                 ` Pedro Alves
2019-10-18 13:27                   ` Richard Biener
2019-10-18 15:31                     ` Pedro Alves
2019-10-18 16:04                       ` Nick Alcock
2019-10-18 11:59             ` Pedro Alves
2019-10-09  9:15     ` Segher Boessenkool
2019-10-15 15:30     ` Nick Alcock

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=CAFiYyc3ehvZzX6+ooDoiPtLTUHScjZrhkaDQ+giY4cvYyBO+_w@mail.gmail.com \
    --to=richard.guenther@gmail.com \
    --cc=gcc-patches@gcc.gnu.org \
    --cc=indu.bhagat@oracle.com \
    --cc=jakub@redhat.com \
    --cc=mark@klomp.org \
    --cc=nick.alcock@oracle.com \
    --cc=palves@redhat.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).