From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 124598 invoked by alias); 26 Mar 2019 13:50:43 -0000 Mailing-List: contact dwz-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Post: List-Help: List-Subscribe: Sender: dwz-owner@sourceware.org Received: (qmail 124563 invoked by uid 48); 26 Mar 2019 13:50:39 -0000 From: "vries at gcc dot gnu.org" To: dwz@sourceware.org Subject: [Bug default/24388] New: Disabling DIE deduplication improves compression for hello Date: Tue, 01 Jan 2019 00:00:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: dwz X-Bugzilla-Component: default X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: vries at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: nobody at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2019-q1/txt/msg00154.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D24388 Bug ID: 24388 Summary: Disabling DIE deduplication improves compression for hello Product: dwz Version: unspecified Status: NEW Severity: enhancement Priority: P2 Component: default Assignee: nobody at sourceware dot org Reporter: vries at gcc dot gnu.org CC: dwz at sourceware dot org Target Milestone: --- If we run dwz on a hello world executable, we measure a reduction of 8% in = size of the relevant debug sections: ... $ gcc hello.c -g $ dwz hello -o hello.dwz $ diff.sh hello hello.dwz=20 .debug_info red: 17% 1467 1221 .debug_abbrev red: 7% 624 584 .debug_str red: 0% 1619 1619 total red: 8% 3710 3424 ... However, if we disable the DIE deduplication optimization, like so: ... diff --git a/dwz.c b/dwz.c index 045bda5..4b9b5e6 100644 --- a/dwz.c +++ b/dwz.c @@ -5038,8 +5038,8 @@ read_debug_info (DSO *dso, int kind) dump_dies (0, cu->cu_die); #endif - if (find_dups (cu->cu_die)) - goto fail; + //if (find_dups (cu->cu_die)) + //goto fail; } if (unlikely (kind =3D=3D DEBUG_TYPES)) { @@ -11080,9 +11080,9 @@ dwz ret =3D read_dwarf (dso, quiet && outfile =3D=3D NULL); if (ret) cleanup (); - else if (partition_dups () - || create_import_tree () - || (unlikely (fi_multifile) + else if (// partition_dups () + // || create_import_tree () + (unlikely (fi_multifile) && (remove_empty_pus () || read_macro (dso))) || read_debug_info (dso, DEBUG_TYPES) ... we get a better result (12%) instead: ... $ dwz hello -o hello.dwz.2 $ diff.sh hello hello.dwz.2=20 .debug_info red: 20% 1467 1183 .debug_abbrev red: 25% 624 474 .debug_str red: 0% 1619 1619 total red: 12% 3710 3276 ... It would be nice if we could pick up the 12% benefit here, by generating th= is output as an intermediate step, and preferring it if it's smaller than the result after the following DIE deduplication optimization. I tried the same experiment with a cc1 (from pr24275, with the tentative fix applied): ... $ dwz cc1 -o cc1.dwz $ diff.sh cc1 cc1.dwz=20 .debug_info red: 45% 111527248 61570632 .debug_abbrev red: 41% 1722726 1030935 .debug_str red: 0% 6609355 6609355 total red: 43% 119859329 69210922 $ dwz cc1 -o cc1.dwz.2 $ diff.sh cc1 cc1.dwz.2=20 .debug_info red: 11% 111527248 100313798 .debug_abbrev red: 11% 1722726 1542574 .debug_str red: 0% 6609355 6609355 total red: 10% 119859329 108465727 ... Here we see the opposite result. By disabling the intermediate step above some cut-off point (say x nr of DI= ES), we might be able to get: - better compression for smaller programs - without spending noticeable extra time for smaller programs - without spending extra time for larger programs. --=20 You are receiving this mail because: You are on the CC list for the bug.