public inbox for dwz@sourceware.org
 help / color / mirror / Atom feed
From: "vries at gcc dot gnu.org" <sourceware-bugzilla@sourceware.org>
To: dwz@sourceware.org
Subject: [Bug default/24388] New: Disabling DIE deduplication improves compression for hello
Date: Tue, 01 Jan 2019 00:00:00 -0000	[thread overview]
Message-ID: <bug-24388-11298@http.sourceware.org/bugzilla/> (raw)

https://sourceware.org/bugzilla/show_bug.cgi?id=24388

            Bug ID: 24388
           Summary: Disabling DIE deduplication improves compression for
                    hello
           Product: dwz
           Version: unspecified
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: default
          Assignee: nobody at sourceware dot org
          Reporter: vries at gcc dot gnu.org
                CC: dwz at sourceware dot org
  Target Milestone: ---

If we run dwz on a hello world executable, we measure a reduction of 8% in size
of the relevant debug sections:
...
$ gcc hello.c -g
$ dwz hello -o hello.dwz
$ diff.sh hello hello.dwz 
.debug_info      red: 17%       1467    1221
.debug_abbrev    red: 7%        624     584
.debug_str       red: 0%        1619    1619
total            red: 8%        3710    3424
...

However, if we disable the DIE deduplication optimization, like so:
...
diff --git a/dwz.c b/dwz.c
index 045bda5..4b9b5e6 100644
--- a/dwz.c
+++ b/dwz.c
@@ -5038,8 +5038,8 @@ read_debug_info (DSO *dso, int kind)
          dump_dies (0, cu->cu_die);
 #endif

-         if (find_dups (cu->cu_die))
-           goto fail;
+         //if (find_dups (cu->cu_die))
+         //goto fail;
        }
       if (unlikely (kind == DEBUG_TYPES))
        {
@@ -11080,9 +11080,9 @@ dwz
       ret = read_dwarf (dso, quiet && outfile == NULL);
       if (ret)
        cleanup ();
-      else if (partition_dups ()
-              || create_import_tree ()
-              || (unlikely (fi_multifile)
+      else if (// partition_dups ()
+              // || create_import_tree ()
+              (unlikely (fi_multifile)
                   && (remove_empty_pus ()
                       || read_macro (dso)))
               || read_debug_info (dso, DEBUG_TYPES)
...

we get a better result (12%) instead:
...
$ dwz hello -o hello.dwz.2
$ diff.sh hello hello.dwz.2 
.debug_info      red: 20%       1467    1183
.debug_abbrev    red: 25%       624     474
.debug_str       red: 0%        1619    1619
total            red: 12%       3710    3276
...

It would be nice if we could pick up the 12% benefit here, by generating this
output as an intermediate step, and preferring it if it's smaller than the
result after the following DIE deduplication optimization.

I tried the same experiment with a cc1 (from pr24275, with the tentative fix
applied):
...
$ dwz cc1 -o cc1.dwz
$ diff.sh cc1 cc1.dwz 
.debug_info      red: 45%       111527248 61570632
.debug_abbrev    red: 41%       1722726    1030935
.debug_str       red: 0%        6609355    6609355
total            red: 43%       119859329 69210922
$ dwz cc1 -o cc1.dwz.2
$ diff.sh cc1 cc1.dwz.2 
.debug_info      red: 11%       111527248 100313798
.debug_abbrev    red: 11%       1722726    1542574
.debug_str       red: 0%        6609355    6609355
total            red: 10%       119859329 108465727
...
Here we see the opposite result.

By disabling the intermediate step above some cut-off point (say x nr of DIES),
we might be able to get:
- better compression for smaller programs
- without spending noticeable extra time for smaller programs
- without spending extra time for larger programs.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

             reply	other threads:[~2019-03-26 13:50 UTC|newest]

Thread overview: 3+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-01  0:00 vries at gcc dot gnu.org [this message]
2019-01-01  0:00 ` [Bug default/24388] " vries at gcc dot gnu.org
2020-01-01  0:00 ` vries at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-24388-11298@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=dwz@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).