public inbox for dwz@sourceware.org
 help / color / mirror / Atom feed
From: "vries at gcc dot gnu.org" <sourceware-bugzilla@sourceware.org>
To: dwz@sourceware.org
Subject: [Bug default/25231] New: Reuse checksums
Date: Tue, 01 Jan 2019 00:00:00 -0000	[thread overview]
Message-ID: <bug-25231-11298@http.sourceware.org/bugzilla/> (raw)

https://sourceware.org/bugzilla/show_bug.cgi?id=25231

            Bug ID: 25231
           Summary: Reuse checksums
           Product: dwz
           Version: unspecified
            Status: NEW
          Severity: enhancement
          Priority: P2
         Component: default
          Assignee: nobody at sourceware dot org
          Reporter: vries at gcc dot gnu.org
                CC: dwz at sourceware dot org
  Target Milestone: ---

For a dwz invocation for files 1 and 2 with multifile 3:
...
$ dwz -m 3 1 2
...
dwz goes through the following phases:
- regular-mode 1
- write-multifile 1
- regular-mode 2
- write-multifile 2
- optimize-multifile
- read-multifile
- finalize-multifile 1
- finalize-multifile 2

It would be nice if we could speed up things in f.i. finalize-multifile mode by
reusing things done in regular-mode.

Let's focus for the moment on the handling of long unsigned int and size_t
(which is a typedef of long unsigned int):
...
 <1><2d>: Abbrev Number: 2 (DW_TAG_typedef)
    <2e>   DW_AT_name        : (indirect string, offset: 0x38): size_t
    <32>   DW_AT_decl_file   : 2
    <33>   DW_AT_decl_line   : 216
    <34>   DW_AT_type        : <0x38>
 <1><38>: Abbrev Number: 3 (DW_TAG_base_type)
    <39>   DW_AT_byte_size   : 8
    <3a>   DW_AT_encoding    : 7        (unsigned)
    <3b>   DW_AT_name        : (indirect string, offset: 0x1a0): long unsigned
int
...

If we look at the checksums in the various phases, we get:
...
$ cp hello 1
$ cp 1 2
$ dwz -m 3 1 2 --devel-dump-dies --devel-trace 2>&1 \
  | egrep 'multifile|size_t|long unsigned int'
  2d O 1b0ea5ae b19e6191 size_t
  38 O 7c5b2022 7c5b2022 long unsigned int
Write-multifile 1
  2d O 1b0ea5ae b19e6191 size_t
  38 O 7c5b2022 7c5b2022 long unsigned int
Write-multifile 2
Optimize-multifile
  14 O ab012a69 bf55db67 size_t
  1c O 4bb43633 4bb43633 long unsigned int
  267 O ab012a69 bf55db67 size_t
  26f O 4bb43633 4bb43633 long unsigned int
Read-multifile
  14 O ab012a69 bf55db67 size_t
  1c O 4bb43633 4bb43633 long unsigned int
Compressing 1 in finalize-multifile mode
  26 O ab012a69 bf55db67 size_t
  2e O 4bb43633 4bb43633 long unsigned int
Compressing 2 in finalize-multifile mode
  26 O ab012a69 bf55db67 size_t
  2e O 4bb43633 4bb43633 long unsigned int
...
we can see that the checksum for long unsigned int is different in
regular-mode:
...
  38 O 7c5b2022 7c5b2022 long unsigned int
...
and finalize-multifile mode:
...
  2e O 4bb43633 4bb43633 long unsigned int
...

That difference is caused by handling of DW_FORM_strp, which is encoded into
the checksum using the index to the string table rather than the string
contents, which is a speed optimization for regular-mode (but which assumes
that the input file is optimally encoded, in the sense that each unique string
is encoded either using DW_FORM_strp, or DW_FORM_string, but not both).

If we force dwz to pick up the string contents for DW_FORM_strp using this
patch:
...
diff --git a/dwz.c b/dwz.c
index 3c886d6..9ab3e33 100644
--- a/dwz.c
+++ b/dwz.c
@@ -2623,8 +2623,7 @@ checksum_die (DSO *dso, dw_cu_ref cu, dw_die_ref top_die,
dw_die_ref die)
            }
          break;
        case DW_FORM_strp:
-         if (unlikely (op_multifile || rd_multifile || fi_multifile)
-             && die->die_ck_state != CK_BAD)
+         if (die->die_ck_state != CK_BAD)
            {
              value = read_32 (ptr);
              if (value >= debug_sections[DEBUG_STR].size)
...
we get:
...
  2d O ab012a69 bf55db67 size_t
  38 O 4bb43633 4bb43633 long unsigned int
Write-multifile 1
  2d O ab012a69 bf55db67 size_t
  38 O 4bb43633 4bb43633 long unsigned int
Write-multifile 2
Optimize-multifile
  14 O ab012a69 bf55db67 size_t
  1c O 4bb43633 4bb43633 long unsigned int
  267 O ab012a69 bf55db67 size_t
  26f O 4bb43633 4bb43633 long unsigned int
Read-multifile
  14 O ab012a69 bf55db67 size_t
  1c O 4bb43633 4bb43633 long unsigned int
Compressing 1 in finalize-multifile mode
  26 O ab012a69 bf55db67 size_t
  2e O 4bb43633 4bb43633 long unsigned int
Compressing 2 in finalize-multifile mode
  26 O ab012a69 bf55db67 size_t
  2e O 4bb43633 4bb43633 long unsigned int
...

So now we have in regular mode:
...
  38 O 4bb43633 4bb43633 long unsigned int
...
and in finalize-multifile mode:
...
  2e O 4bb43633 4bb43633 long unsigned int
...

And for size_t in regular mode:
...
  2d O ab012a69 bf55db67 size_t
...
and in finalize-multifile mode:
...
  26 O ab012a69 bf55db67 size_t
...

Also, we can see that actually the checksums are the same in optimize-multifile
and read-multifile mode as well, so we could reuse there as well.

There will be DIEs for which we can't reuse the checksums, due to differences
in handling references. I'm not sure for what percentage of DIEs this would
apply.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

             reply	other threads:[~2019-11-29  9:22 UTC|newest]

Thread overview: 2+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2019-01-01  0:00 vries at gcc dot gnu.org [this message]
2019-01-01  0:00 ` [Bug default/25231] " vries at gcc dot gnu.org

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-25231-11298@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=dwz@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).