From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 103706 invoked by alias); 29 Nov 2019 09:22:40 -0000 Mailing-List: contact dwz-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Post: List-Help: List-Subscribe: Sender: dwz-owner@sourceware.org Received: (qmail 103659 invoked by uid 48); 29 Nov 2019 09:22:35 -0000 From: "vries at gcc dot gnu.org" To: dwz@sourceware.org Subject: [Bug default/25231] New: Reuse checksums Date: Tue, 01 Jan 2019 00:00:00 -0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: dwz X-Bugzilla-Component: default X-Bugzilla-Version: unspecified X-Bugzilla-Keywords: X-Bugzilla-Severity: enhancement X-Bugzilla-Who: vries at gcc dot gnu.org X-Bugzilla-Status: NEW X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: nobody at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter cc target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-SW-Source: 2019-q4/txt/msg00099.txt.bz2 https://sourceware.org/bugzilla/show_bug.cgi?id=3D25231 Bug ID: 25231 Summary: Reuse checksums Product: dwz Version: unspecified Status: NEW Severity: enhancement Priority: P2 Component: default Assignee: nobody at sourceware dot org Reporter: vries at gcc dot gnu.org CC: dwz at sourceware dot org Target Milestone: --- For a dwz invocation for files 1 and 2 with multifile 3: ... $ dwz -m 3 1 2 ... dwz goes through the following phases: - regular-mode 1 - write-multifile 1 - regular-mode 2 - write-multifile 2 - optimize-multifile - read-multifile - finalize-multifile 1 - finalize-multifile 2 It would be nice if we could speed up things in f.i. finalize-multifile mod= e by reusing things done in regular-mode. Let's focus for the moment on the handling of long unsigned int and size_t (which is a typedef of long unsigned int): ... <1><2d>: Abbrev Number: 2 (DW_TAG_typedef) <2e> DW_AT_name : (indirect string, offset: 0x38): size_t <32> DW_AT_decl_file : 2 <33> DW_AT_decl_line : 216 <34> DW_AT_type : <0x38> <1><38>: Abbrev Number: 3 (DW_TAG_base_type) <39> DW_AT_byte_size : 8 <3a> DW_AT_encoding : 7 (unsigned) <3b> DW_AT_name : (indirect string, offset: 0x1a0): long unsig= ned int ... If we look at the checksums in the various phases, we get: ... $ cp hello 1 $ cp 1 2 $ dwz -m 3 1 2 --devel-dump-dies --devel-trace 2>&1 \ | egrep 'multifile|size_t|long unsigned int' 2d O 1b0ea5ae b19e6191 size_t 38 O 7c5b2022 7c5b2022 long unsigned int Write-multifile 1 2d O 1b0ea5ae b19e6191 size_t 38 O 7c5b2022 7c5b2022 long unsigned int Write-multifile 2 Optimize-multifile 14 O ab012a69 bf55db67 size_t 1c O 4bb43633 4bb43633 long unsigned int 267 O ab012a69 bf55db67 size_t 26f O 4bb43633 4bb43633 long unsigned int Read-multifile 14 O ab012a69 bf55db67 size_t 1c O 4bb43633 4bb43633 long unsigned int Compressing 1 in finalize-multifile mode 26 O ab012a69 bf55db67 size_t 2e O 4bb43633 4bb43633 long unsigned int Compressing 2 in finalize-multifile mode 26 O ab012a69 bf55db67 size_t 2e O 4bb43633 4bb43633 long unsigned int ... we can see that the checksum for long unsigned int is different in regular-mode: ... 38 O 7c5b2022 7c5b2022 long unsigned int ... and finalize-multifile mode: ... 2e O 4bb43633 4bb43633 long unsigned int ... That difference is caused by handling of DW_FORM_strp, which is encoded into the checksum using the index to the string table rather than the string contents, which is a speed optimization for regular-mode (but which assumes that the input file is optimally encoded, in the sense that each unique str= ing is encoded either using DW_FORM_strp, or DW_FORM_string, but not both). If we force dwz to pick up the string contents for DW_FORM_strp using this patch: ... diff --git a/dwz.c b/dwz.c index 3c886d6..9ab3e33 100644 --- a/dwz.c +++ b/dwz.c @@ -2623,8 +2623,7 @@ checksum_die (DSO *dso, dw_cu_ref cu, dw_die_ref top_= die, dw_die_ref die) } break; case DW_FORM_strp: - if (unlikely (op_multifile || rd_multifile || fi_multifile) - && die->die_ck_state !=3D CK_BAD) + if (die->die_ck_state !=3D CK_BAD) { value =3D read_32 (ptr); if (value >=3D debug_sections[DEBUG_STR].size) ... we get: ... 2d O ab012a69 bf55db67 size_t 38 O 4bb43633 4bb43633 long unsigned int Write-multifile 1 2d O ab012a69 bf55db67 size_t 38 O 4bb43633 4bb43633 long unsigned int Write-multifile 2 Optimize-multifile 14 O ab012a69 bf55db67 size_t 1c O 4bb43633 4bb43633 long unsigned int 267 O ab012a69 bf55db67 size_t 26f O 4bb43633 4bb43633 long unsigned int Read-multifile 14 O ab012a69 bf55db67 size_t 1c O 4bb43633 4bb43633 long unsigned int Compressing 1 in finalize-multifile mode 26 O ab012a69 bf55db67 size_t 2e O 4bb43633 4bb43633 long unsigned int Compressing 2 in finalize-multifile mode 26 O ab012a69 bf55db67 size_t 2e O 4bb43633 4bb43633 long unsigned int ... So now we have in regular mode: ... 38 O 4bb43633 4bb43633 long unsigned int ... and in finalize-multifile mode: ... 2e O 4bb43633 4bb43633 long unsigned int ... And for size_t in regular mode: ... 2d O ab012a69 bf55db67 size_t ... and in finalize-multifile mode: ... 26 O ab012a69 bf55db67 size_t ... Also, we can see that actually the checksums are the same in optimize-multi= file and read-multifile mode as well, so we could reuse there as well. There will be DIEs for which we can't reuse the checksums, due to differenc= es in handling references. I'm not sure for what percentage of DIEs this would apply. --=20 You are receiving this mail because: You are on the CC list for the bug.