public inbox for dwz@sourceware.org
 help / color / mirror / Atom feed
* [Bug default/25951] New: support for parallel processing?
@ 2020-05-08 12:52 samuel.thibault@ens-lyon.org
  2021-03-02  7:53 ` [Bug default/25951] " vries at gcc dot gnu.org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: samuel.thibault@ens-lyon.org @ 2020-05-08 12:52 UTC (permalink / raw)
  To: dwz

https://sourceware.org/bugzilla/show_bug.cgi?id=25951

            Bug ID: 25951
           Summary: support for parallel processing?
           Product: dwz
           Version: unspecified
            Status: NEW
          Severity: normal
          Priority: P2
         Component: default
          Assignee: nobody at sourceware dot org
          Reporter: samuel.thibault@ens-lyon.org
                CC: dwz at sourceware dot org
  Target Milestone: ---

Hello,

When applied on big packages (e.g. libreoffice), dwz takes a very long time,
while this could be parallelized. Of course the inter-ELF factorization would
be difficult to parallelize, but at least runs without the -m option, and even
with the -m option the first step that deduplicates in each ELF separately,
could be parallelized probably quite easily.

Samuel

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug default/25951] support for parallel processing?
  2020-05-08 12:52 [Bug default/25951] New: support for parallel processing? samuel.thibault@ens-lyon.org
@ 2021-03-02  7:53 ` vries at gcc dot gnu.org
  2021-03-10 10:19 ` vries at gcc dot gnu.org
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2021-03-02  7:53 UTC (permalink / raw)
  To: dwz

https://sourceware.org/bugzilla/show_bug.cgi?id=25951

Tom de Vries <vries at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |vries at gcc dot gnu.org
           Severity|normal                      |enhancement

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug default/25951] support for parallel processing?
  2020-05-08 12:52 [Bug default/25951] New: support for parallel processing? samuel.thibault@ens-lyon.org
  2021-03-02  7:53 ` [Bug default/25951] " vries at gcc dot gnu.org
@ 2021-03-10 10:19 ` vries at gcc dot gnu.org
  2021-03-23 20:22 ` vries at gcc dot gnu.org
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2021-03-10 10:19 UTC (permalink / raw)
  To: dwz

https://sourceware.org/bugzilla/show_bug.cgi?id=25951

--- Comment #1 from Tom de Vries <vries at gcc dot gnu.org> ---
Created attachment 13297
  --> https://sourceware.org/bugzilla/attachment.cgi?id=13297&action=edit
Demonstrator patch

This demonstrator patch implements a simple form of multithreading, which only
works without:
- multifile (-m)
- hardlink (-h)
- low-mem limit 0 (-l0)

If a file hits the low-mem limit during the parallel phase, it's rerun in
low-mem mode after the parallel phase.

It passes the test-suite.  There is only one thread-sanitizer warning left, for
multiple assignment of dwz_oom to obstack_alloc_failed_handler.

I did a build of the libreoffice package on openSUSE with dwz disabled,
harvested the resulting .debug files (in total 175 files, 685MB), and did a dwz
run (without multifile) using those files.

With master:
...
maxmem: 714956
real: 17.77
user: 15.76
system: 0.50
...

With the patch on top of master:
...
maxmem: 1106516
real: 10.37
user: 20.59
system: 1.46
...

So, the trade off is as expected: faster realtime, but higher peak memory.

DWZ though contains the low-mem mode to keep memory usage in check, such that
dwz can be used on 32-bit systems, with still relatively large files.  So the
trade off on those systems may not be advantageous.  We could fix this by not
enabling parallel processing on such systems.

OTOH, we could also spawn processes instead of threads.  That means the
per-process peak memory does not increase.  It would also mean less messy code
changes (not having to use __thread all over the place).

An initial version that wouldn't deal with multifile (like this demonstrator
patch) wouldn't need much changes.  A version that would support multifile
would need a switch to indicate the location of the dwz.debug_info etc files. 
So, something like:
...
$ dwz -m 3 1 2
 create temp dir /tmp/abcdef
 spawn dwz 1 --multifile-dir /tmp/abcdef
 spawn dwz 2 --multifile-dir /tmp/abcdef
 wait for 2 spawned processes to finish ...
 spawned dwz 1 - compressing
 spawned dwz 2 - compressing
 spawned dwz 1 - multifile write (using dir /tmp/abcdef)
 spawned dwz 2 - multifile write (using dir /tmp/abcdef)
 spawned dwz 1 - done
 spawned dwz 2 - done
 waiting done
 multifile optimize (using files in /tmp/abcdef)
 multifile read
 multifile finalize 1
 multifile finalize 2
...

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug default/25951] support for parallel processing?
  2020-05-08 12:52 [Bug default/25951] New: support for parallel processing? samuel.thibault@ens-lyon.org
  2021-03-02  7:53 ` [Bug default/25951] " vries at gcc dot gnu.org
  2021-03-10 10:19 ` vries at gcc dot gnu.org
@ 2021-03-23 20:22 ` vries at gcc dot gnu.org
  2021-03-26 11:47 ` vries at gcc dot gnu.org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2021-03-23 20:22 UTC (permalink / raw)
  To: dwz

https://sourceware.org/bugzilla/show_bug.cgi?id=25951

--- Comment #2 from Tom de Vries <vries at gcc dot gnu.org> ---
Posted RFC: https://sourceware.org/pipermail/dwz/2021q1/001166.html

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug default/25951] support for parallel processing?
  2020-05-08 12:52 [Bug default/25951] New: support for parallel processing? samuel.thibault@ens-lyon.org
                   ` (2 preceding siblings ...)
  2021-03-23 20:22 ` vries at gcc dot gnu.org
@ 2021-03-26 11:47 ` vries at gcc dot gnu.org
  2021-03-26 11:51 ` jakub at redhat dot com
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2021-03-26 11:47 UTC (permalink / raw)
  To: dwz

https://sourceware.org/bugzilla/show_bug.cgi?id=25951

--- Comment #3 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #2)
> Posted RFC: https://sourceware.org/pipermail/dwz/2021q1/001166.html

And committed at
https://sourceware.org/git/?p=dwz.git;a=commit;h=7755593c86b701547ec276320533efc3e4c165f3
.

Note that this still does not apply when multifile is used.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug default/25951] support for parallel processing?
  2020-05-08 12:52 [Bug default/25951] New: support for parallel processing? samuel.thibault@ens-lyon.org
                   ` (3 preceding siblings ...)
  2021-03-26 11:47 ` vries at gcc dot gnu.org
@ 2021-03-26 11:51 ` jakub at redhat dot com
  2021-03-26 16:42 ` vries at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: jakub at redhat dot com @ 2021-03-26 11:51 UTC (permalink / raw)
  To: dwz

https://sourceware.org/bugzilla/show_bug.cgi?id=25951

Jakub Jelinek <jakub at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |jakub at redhat dot com

--- Comment #4 from Jakub Jelinek <jakub at redhat dot com> ---
For multifile, perhaps each fork could fill in its own set of multifiles and
then they'd be merged together before being processed.
But we need to ensure reproduceability, so the order in which the multifile
chunks from different programs/shared libraries are merged back needs to be
independent on the number of forks.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug default/25951] support for parallel processing?
  2020-05-08 12:52 [Bug default/25951] New: support for parallel processing? samuel.thibault@ens-lyon.org
                   ` (4 preceding siblings ...)
  2021-03-26 11:51 ` jakub at redhat dot com
@ 2021-03-26 16:42 ` vries at gcc dot gnu.org
  2021-03-31  7:18 ` vries at gcc dot gnu.org
  2021-04-12  8:22 ` vries at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2021-03-26 16:42 UTC (permalink / raw)
  To: dwz

https://sourceware.org/bugzilla/show_bug.cgi?id=25951

--- Comment #5 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Jakub Jelinek from comment #4)
> For multifile, perhaps each fork could fill in its own set of multifiles and
> then they'd be merged together before being processed.
> But we need to ensure reproduceability, so the order in which the multifile
> chunks from different programs/shared libraries are merged back needs to be
> independent on the number of forks.

I've posted a first parallel+multifile implementation, that does not yet have
reproduceability (though it does have reproducible compression AFAIU):
https://sourceware.org/pipermail/dwz/2021q1/001197.html .

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug default/25951] support for parallel processing?
  2020-05-08 12:52 [Bug default/25951] New: support for parallel processing? samuel.thibault@ens-lyon.org
                   ` (5 preceding siblings ...)
  2021-03-26 16:42 ` vries at gcc dot gnu.org
@ 2021-03-31  7:18 ` vries at gcc dot gnu.org
  2021-04-12  8:22 ` vries at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2021-03-31  7:18 UTC (permalink / raw)
  To: dwz

https://sourceware.org/bugzilla/show_bug.cgi?id=25951

--- Comment #6 from Tom de Vries <vries at gcc dot gnu.org> ---
(In reply to Tom de Vries from comment #3)
> (In reply to Tom de Vries from comment #2)
> > Posted RFC: https://sourceware.org/pipermail/dwz/2021q1/001166.html
> 
> And committed at
> https://sourceware.org/git/?p=dwz.git;a=commit;
> h=7755593c86b701547ec276320533efc3e4c165f3 .
> 
> Note that this still does not apply when multifile is used.

And committed:
https://sourceware.org/git/?p=dwz.git;a=commit;h=64ea1adcda52d22f00f17e219bc8e023b62b9a03
.

Now -j works for multifile as well, provided -e and -p are used.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug default/25951] support for parallel processing?
  2020-05-08 12:52 [Bug default/25951] New: support for parallel processing? samuel.thibault@ens-lyon.org
                   ` (6 preceding siblings ...)
  2021-03-31  7:18 ` vries at gcc dot gnu.org
@ 2021-04-12  8:22 ` vries at gcc dot gnu.org
  7 siblings, 0 replies; 9+ messages in thread
From: vries at gcc dot gnu.org @ 2021-04-12  8:22 UTC (permalink / raw)
  To: dwz

https://sourceware.org/bugzilla/show_bug.cgi?id=25951

--- Comment #7 from Tom de Vries <vries at gcc dot gnu.org> ---
Created attachment 13362
  --> https://sourceware.org/bugzilla/attachment.cgi?id=13362&action=edit
Demonstator source file using seperate reaper/coordinator

(In reply to Tom de Vries from comment #6)
> Now -j works for multifile as well, provided -e and -p are used.

For the last step, to make multifile work with -j without -e/-p, the
communication scheme needs to be more elaborate.

The parent needs to both:
- reap the children
- communicate with the children about the multifile

It cannot do both tasks in blocking fashion. It could do them in a non-blocking
fashion, but then you have busy wait, which is bad.

The solution I came up with is to have the parent spawn a seperate process, the
coordinator.

Then the job of the parent is to reap children.

The job of the coordinator is to communicate with the children about the
multifile: the children request permission to contribute to the multifile, with
a certain type endian/pointer-size.  The coordinator replies back whether and
when that's ok.

When the parent reaps a child, it notifies the coordinator to ensure that the
coordinator is not stuck on waiting for a request from that child.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2021-04-12  8:22 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2020-05-08 12:52 [Bug default/25951] New: support for parallel processing? samuel.thibault@ens-lyon.org
2021-03-02  7:53 ` [Bug default/25951] " vries at gcc dot gnu.org
2021-03-10 10:19 ` vries at gcc dot gnu.org
2021-03-23 20:22 ` vries at gcc dot gnu.org
2021-03-26 11:47 ` vries at gcc dot gnu.org
2021-03-26 11:51 ` jakub at redhat dot com
2021-03-26 16:42 ` vries at gcc dot gnu.org
2021-03-31  7:18 ` vries at gcc dot gnu.org
2021-04-12  8:22 ` vries at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).