* Re: [Highlight] Performance improvements
2019-01-01 0:00 [Highlight] Performance improvements Tom de Vries
@ 2019-01-01 0:00 ` Martin Liška
2019-01-01 0:00 ` Tom de Vries
0 siblings, 1 reply; 6+ messages in thread
From: Martin Liška @ 2019-01-01 0:00 UTC (permalink / raw)
To: Tom de Vries, dwz, Jakub Jelinek, Mark Wielaard, Michael Matz
On 11/26/19 6:59 PM, Tom de Vries wrote:
> Hi,
>
> I've been working on performance improvements for dwz, using a cc1
> binary as my optimization vehicle.
>
> Comparing the situation:
> - before (commit 04a676d Add --devel-partition-dups-opt), and
> - after (current master, commit e405c62 Add --devel-die-count-method
> {none,estimate})
> I get the following results.
>
> When avoiding running into the low-mem die-limit using -lnone, we get
> ~25% performance improvement, due to an improved hash function and an
> improved hash table allocation strategy (without increasing peak memory
> usage):
> ...
> real: mean: 7378.10 100.00% stddev: 45.31
> mean: 5558.80 75.34% stddev: 35.18
> user: mean: 7106.30 100.00% stddev: 41.53
> mean: 5328.10 74.98% stddev: 22.33
> sys: mean: 271.60 100.00% stddev: 39.57
> mean: 230.00 84.68% stddev: 40.45
> ...
>
> And if we don't avoid running into the low-mem die-limit, we get ~38%
> performance improvement:
> ...
> real: mean: 15084.80 100.00% stddev: 44.53
> mean: 9232.90 61.21% stddev: 41.80
> user: mean: 14759.40 100.00% stddev: 30.62
> mean: 9100.10 61.66% stddev: 41.75
> sys: mean: 324.00 100.00% stddev: 39.51
> mean: 132.00 40.74% stddev: 27.26
> ...
> which is also paired with a reduction in peak memory usage of ~34%, from
> 0.95GB to 0.63GB, due to running into the low-mem die-limit in a more
> efficient manner.
Hi.
That sounds very promising! I would like to see it being used in our openSUSE
package. Are you planning to use it?
Thanks,
Martin
>
> Thanks,
> - Tom
>
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Highlight] Performance improvements
2019-01-01 0:00 ` Martin Liška
@ 2019-01-01 0:00 ` Tom de Vries
2021-12-23 11:57 ` Martin Liška
0 siblings, 1 reply; 6+ messages in thread
From: Tom de Vries @ 2019-01-01 0:00 UTC (permalink / raw)
To: Martin Liška, dwz, Jakub Jelinek, Mark Wielaard, Michael Matz
On 27-11-2019 13:52, Martin Liška wrote:
> On 11/26/19 6:59 PM, Tom de Vries wrote:
>> Hi,
>>
>> I've been working on performance improvements for dwz, using a cc1
>> binary as my optimization vehicle.
>>
>> Comparing the situation:
>> - before (commit 04a676d Add --devel-partition-dups-opt), and
>> - after (current master, commit e405c62 Add --devel-die-count-method
>> Â Â {none,estimate})
>> I get the following results.
>>
>> When avoiding running into the low-mem die-limit using -lnone, we get
>> ~25% performance improvement, due to an improved hash function and an
>> improved hash table allocation strategy (without increasing peak memory
>> usage):
>> ...
>> real:Â mean:Â 7378.10Â 100.00%Â stddev:Â 45.31
>> Â Â Â Â Â Â Â mean:Â 5558.80Â Â 75.34%Â stddev:Â 35.18
>> user:Â mean:Â 7106.30Â 100.00%Â stddev:Â 41.53
>> Â Â Â Â Â Â Â mean:Â 5328.10Â Â 74.98%Â stddev:Â 22.33
>> sys:Â Â mean:Â Â 271.60Â 100.00%Â stddev:Â 39.57
>> Â Â Â Â Â Â Â mean:Â Â 230.00Â Â 84.68%Â stddev:Â 40.45
>> ...
>>
>> And if we don't avoid running into the low-mem die-limit, we get ~38%
>> performance improvement:
>> ...
>> real:Â mean:Â 15084.80 100.00%Â stddev:Â 44.53
>> Â Â Â Â Â Â Â mean:Â Â 9232.90Â 61.21%Â stddev:Â 41.80
>> user:Â mean:Â 14759.40 100.00%Â stddev:Â 30.62
>> Â Â Â Â Â Â Â mean:Â Â 9100.10Â 61.66%Â stddev:Â 41.75
>> sys:Â Â mean:Â Â Â 324.00 100.00%Â stddev:Â 39.51
>> Â Â Â Â Â Â Â mean:Â Â Â 132.00Â 40.74%Â stddev:Â 27.26
>> ...
>> which is also paired with a reduction in peak memory usage of ~34%, from
>> 0.95GB to 0.63GB, due to running into the low-mem die-limit in a more
>> efficient manner.
>
> Hi.
>
> That sounds very promising! I would like to see it being used in our
> openSUSE
> package. Are you planning to use it?
>
For the dwz openSUSE package I follow the usual strategy: backport
bugfixes and upgrade to newer releases, once available.
So, the intention is that this lands in openSUSE with the next release.
I'm currently working on a dwz bug fix, and if that is done, and I
manage to finalize the odr stuff as well, I think it'll be time for a
new release.
Thanks,
- Tom
^ permalink raw reply [flat|nested] 6+ messages in thread
* [Highlight] Performance improvements
@ 2019-01-01 0:00 Tom de Vries
2019-01-01 0:00 ` Martin Liška
0 siblings, 1 reply; 6+ messages in thread
From: Tom de Vries @ 2019-01-01 0:00 UTC (permalink / raw)
To: dwz, Jakub Jelinek, Mark Wielaard, Michael Matz, Martin Liska
Hi,
I've been working on performance improvements for dwz, using a cc1
binary as my optimization vehicle.
Comparing the situation:
- before (commit 04a676d Add --devel-partition-dups-opt), and
- after (current master, commit e405c62 Add --devel-die-count-method
{none,estimate})
I get the following results.
When avoiding running into the low-mem die-limit using -lnone, we get
~25% performance improvement, due to an improved hash function and an
improved hash table allocation strategy (without increasing peak memory
usage):
...
real: mean: 7378.10 100.00% stddev: 45.31
mean: 5558.80 75.34% stddev: 35.18
user: mean: 7106.30 100.00% stddev: 41.53
mean: 5328.10 74.98% stddev: 22.33
sys: mean: 271.60 100.00% stddev: 39.57
mean: 230.00 84.68% stddev: 40.45
...
And if we don't avoid running into the low-mem die-limit, we get ~38%
performance improvement:
...
real: mean: 15084.80 100.00% stddev: 44.53
mean: 9232.90 61.21% stddev: 41.80
user: mean: 14759.40 100.00% stddev: 30.62
mean: 9100.10 61.66% stddev: 41.75
sys: mean: 324.00 100.00% stddev: 39.51
mean: 132.00 40.74% stddev: 27.26
...
which is also paired with a reduction in peak memory usage of ~34%, from
0.95GB to 0.63GB, due to running into the low-mem die-limit in a more
efficient manner.
Thanks,
- Tom
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Highlight] Performance improvements
2019-01-01 0:00 ` Tom de Vries
@ 2021-12-23 11:57 ` Martin Liška
2022-01-03 22:06 ` Mark Wielaard
0 siblings, 1 reply; 6+ messages in thread
From: Martin Liška @ 2021-12-23 11:57 UTC (permalink / raw)
To: Tom de Vries, dwz, Jakub Jelinek, Mark Wielaard, Michael Matz
Hello.
I've made couple of experiments with dwz speed. I've taken the following packages:
gcc, krita, libetonyek, rtags, sysdig and run dwz -m x ... for them.
There are numbers I collected for the following configurations:
dwz (system package, built with LTO and -O2), dwz-O2_lto is supposed
to be the same (built from source), then I experimented with -O3 and PGO
(based on tramp3d copies 4 times). And the final run is experimental patch
I have that replaces the iterative_hash with xxhash:
https://github.com/Cyan4973/xxHash
# 1/5: sysdig (60M)
dwz : 10.0
dwz : 9.8 (98.7%)
dwz-O2_lto : 9.5 (95.6%)
dwz-O3_lto : 9.2 (91.9%)
dwz-O3_lto_pgo : 8.1 (81.3%)
dwz-O3_lto_pgo_xxhash : 7.3 (72.9%)
# 2/5: rtags (148M)
dwz : 19.6
dwz : 19.6 (99.9%)
dwz-O2_lto : 17.4 (89.0%)
dwz-O3_lto : 16.7 (85.4%)
dwz-O3_lto_pgo : 14.4 (73.6%)
dwz-O3_lto_pgo_xxhash : 13.2 (67.6%)
# 3/5: libetonyek (112M)
dwz : 10.5
dwz : 10.5 (100.6%)
dwz-O2_lto : 10.8 (102.8%)
dwz-O3_lto : 10.1 (96.7%)
dwz-O3_lto_pgo : 9.1 (87.4%)
dwz-O3_lto_pgo_xxhash : 8.1 (77.1%)
# 4/5: krita (685M)
dwz : 133.7
dwz : 134.3 (100.5%)
dwz-O2_lto : 95.3 (71.3%)
dwz-O3_lto : 91.2 (68.2%)
dwz-O3_lto_pgo : 78.9 (59.0%)
dwz-O3_lto_pgo_xxhash : 71.6 (53.5%)
# 5/5: gcc (1.2G)
dwz : 61.9
dwz : 61.9 (99.9%)
dwz-O2_lto : 58.5 (94.5%)
dwz-O3_lto : 56.6 (91.3%)
dwz-O3_lto_pgo : 54.1 (87.4%)
dwz-O3_lto_pgo_xxhash : 51.7 (83.4%)
So as seen, using -O3 really help, one gets a bigger binary, but as dwz is small
it's negligible:
bloaty dwz-O3_lto -- dwz-O2_lto
FILE SIZE VM SIZE
-------------- --------------
+28% +50.3Ki [ = ] 0 .debug_loclists
+18% +25.3Ki +18% +25.3Ki .text
+12% +24.6Ki [ = ] 0 .debug_info
+16% +17.3Ki [ = ] 0 .debug_line
+31% +6.19Ki [ = ] 0 .debug_rnglists
+11% +689 [ = ] 0 .debug_abbrev
+7.1% +633 [ = ] 0 .strtab
+5.5% +504 +5.5% +504 .eh_frame
+1.3% +453 [ = ] 0 .debug_str
+0.8% +375 +0.8% +375 .rodata
+2.8% +336 [ = ] 0 .symtab
+11% +64 [ = ] 0 .debug_aranges
+4.2% +64 +4.4% +64 .eh_frame_hdr
[ = ] 0 +1.8% +32 .bss
-3.1% -21 -3.1% -21 [LOAD #2 [RX]]
-61.0% -2.20Ki [ = ] 0 [Unmapped]
+16% +124Ki +13% +26.2Ki TOTAL
Then, PGO also helps significantly. And finally, using xxhash one can get 5-10% percent
improvement.
For now I'm suggesting using -O3 and PGO for our openSUSE package:
https://build.opensuse.org/request/show/942235
Upstream questions I have:
- What about changing -O2 with -O3 by default?
- Are you interested in the xxhash patch? Do you want it as a conditional build
or may I replace the currently existing hash function?
Cheers,
Martin
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Highlight] Performance improvements
2021-12-23 11:57 ` Martin Liška
@ 2022-01-03 22:06 ` Mark Wielaard
2022-01-05 8:01 ` Martin Liška
0 siblings, 1 reply; 6+ messages in thread
From: Mark Wielaard @ 2022-01-03 22:06 UTC (permalink / raw)
To: Martin Liška; +Cc: Tom de Vries, dwz, Jakub Jelinek, Michael Matz
Hi Martin,
I noticed that this is a reply to a thread from 2 years ago. Is it
related to the work mentioned by Tom in that thread?
On Thu, Dec 23, 2021 at 12:57:48PM +0100, Martin Liška wrote:
> I've made couple of experiments with dwz speed. I've taken the following packages:
> gcc, krita, libetonyek, rtags, sysdig and run dwz -m x ... for them.
>
> There are numbers I collected for the following configurations:
> dwz (system package, built with LTO and -O2), dwz-O2_lto is supposed
> to be the same (built from source), then I experimented with -O3 and PGO
> (based on tramp3d copies 4 times). And the final run is experimental patch
> I have that replaces the iterative_hash with xxhash:
> https://github.com/Cyan4973/xxHash
>
> # 1/5: sysdig (60M)
> dwz : 10.0
> dwz : 9.8 (98.7%)
> dwz-O2_lto : 9.5 (95.6%)
> dwz-O3_lto : 9.2 (91.9%)
> dwz-O3_lto_pgo : 8.1 (81.3%)
> dwz-O3_lto_pgo_xxhash : 7.3 (72.9%)
> # 2/5: rtags (148M)
> dwz : 19.6
> dwz : 19.6 (99.9%)
> dwz-O2_lto : 17.4 (89.0%)
> dwz-O3_lto : 16.7 (85.4%)
> dwz-O3_lto_pgo : 14.4 (73.6%)
> dwz-O3_lto_pgo_xxhash : 13.2 (67.6%)
> # 3/5: libetonyek (112M)
> dwz : 10.5
> dwz : 10.5 (100.6%)
> dwz-O2_lto : 10.8 (102.8%)
> dwz-O3_lto : 10.1 (96.7%)
> dwz-O3_lto_pgo : 9.1 (87.4%)
> dwz-O3_lto_pgo_xxhash : 8.1 (77.1%)
> # 4/5: krita (685M)
> dwz : 133.7
> dwz : 134.3 (100.5%)
> dwz-O2_lto : 95.3 (71.3%)
> dwz-O3_lto : 91.2 (68.2%)
> dwz-O3_lto_pgo : 78.9 (59.0%)
> dwz-O3_lto_pgo_xxhash : 71.6 (53.5%)
> # 5/5: gcc (1.2G)
> dwz : 61.9
> dwz : 61.9 (99.9%)
> dwz-O2_lto : 58.5 (94.5%)
> dwz-O3_lto : 56.6 (91.3%)
> dwz-O3_lto_pgo : 54.1 (87.4%)
> dwz-O3_lto_pgo_xxhash : 51.7 (83.4%)
>
> So as seen, using -O3 really help, one gets a bigger binary, but as dwz is small
> it's negligible:
>
> bloaty dwz-O3_lto -- dwz-O2_lto
> FILE SIZE VM SIZE
> -------------- --------------
> +28% +50.3Ki [ = ] 0 .debug_loclists
> +18% +25.3Ki +18% +25.3Ki .text
> +12% +24.6Ki [ = ] 0 .debug_info
> +16% +17.3Ki [ = ] 0 .debug_line
> +31% +6.19Ki [ = ] 0 .debug_rnglists
> +11% +689 [ = ] 0 .debug_abbrev
> +7.1% +633 [ = ] 0 .strtab
> +5.5% +504 +5.5% +504 .eh_frame
> +1.3% +453 [ = ] 0 .debug_str
> +0.8% +375 +0.8% +375 .rodata
> +2.8% +336 [ = ] 0 .symtab
> +11% +64 [ = ] 0 .debug_aranges
> +4.2% +64 +4.4% +64 .eh_frame_hdr
> [ = ] 0 +1.8% +32 .bss
> -3.1% -21 -3.1% -21 [LOAD #2 [RX]]
> -61.0% -2.20Ki [ = ] 0 [Unmapped]
> +16% +124Ki +13% +26.2Ki TOTAL
>
> Then, PGO also helps significantly. And finally, using xxhash one can get 5-10% percent
> improvement.
>
> For now I'm suggesting using -O3 and PGO for our openSUSE package:
> https://build.opensuse.org/request/show/942235
>
> Upstream questions I have:
> - What about changing -O2 with -O3 by default?
Did you test that without -flto? If it still gets a ~5% speedup then I
like that idea. Or maybe we should also include -flto by default?
> - Are you interested in the xxhash patch? Do you want it as a conditional build
> or may I replace the currently existing hash function?
I think it is best to simply replace the existing hash function
instead of making it a conditional thing.
Does it rely on having the libxxhash dynamic library available or
would we simply embed a copy (replacing the hashtab.[ch] files)?
Cheers,
Mark
^ permalink raw reply [flat|nested] 6+ messages in thread
* Re: [Highlight] Performance improvements
2022-01-03 22:06 ` Mark Wielaard
@ 2022-01-05 8:01 ` Martin Liška
0 siblings, 0 replies; 6+ messages in thread
From: Martin Liška @ 2022-01-05 8:01 UTC (permalink / raw)
To: Mark Wielaard; +Cc: Tom de Vries, dwz, Jakub Jelinek, Michael Matz
On 1/3/22 23:06, Mark Wielaard wrote:
> Hi Martin,
>
> I noticed that this is a reply to a thread from 2 years ago. Is it
> related to the work mentioned by Tom in that thread?
Hello.
It's related only a bit as it's also connected to Performance improvements :)
>
> On Thu, Dec 23, 2021 at 12:57:48PM +0100, Martin Liška wrote:
>> I've made couple of experiments with dwz speed. I've taken the following packages:
>> gcc, krita, libetonyek, rtags, sysdig and run dwz -m x ... for them.
>>
>> There are numbers I collected for the following configurations:
>> dwz (system package, built with LTO and -O2), dwz-O2_lto is supposed
>> to be the same (built from source), then I experimented with -O3 and PGO
>> (based on tramp3d copies 4 times). And the final run is experimental patch
>> I have that replaces the iterative_hash with xxhash:
>> https://github.com/Cyan4973/xxHash
>>
>> # 1/5: sysdig (60M)
>> dwz : 10.0
>> dwz : 9.8 (98.7%)
>> dwz-O2_lto : 9.5 (95.6%)
>> dwz-O3_lto : 9.2 (91.9%)
>> dwz-O3_lto_pgo : 8.1 (81.3%)
>> dwz-O3_lto_pgo_xxhash : 7.3 (72.9%)
>> # 2/5: rtags (148M)
>> dwz : 19.6
>> dwz : 19.6 (99.9%)
>> dwz-O2_lto : 17.4 (89.0%)
>> dwz-O3_lto : 16.7 (85.4%)
>> dwz-O3_lto_pgo : 14.4 (73.6%)
>> dwz-O3_lto_pgo_xxhash : 13.2 (67.6%)
>> # 3/5: libetonyek (112M)
>> dwz : 10.5
>> dwz : 10.5 (100.6%)
>> dwz-O2_lto : 10.8 (102.8%)
>> dwz-O3_lto : 10.1 (96.7%)
>> dwz-O3_lto_pgo : 9.1 (87.4%)
>> dwz-O3_lto_pgo_xxhash : 8.1 (77.1%)
>> # 4/5: krita (685M)
>> dwz : 133.7
>> dwz : 134.3 (100.5%)
>> dwz-O2_lto : 95.3 (71.3%)
>> dwz-O3_lto : 91.2 (68.2%)
>> dwz-O3_lto_pgo : 78.9 (59.0%)
>> dwz-O3_lto_pgo_xxhash : 71.6 (53.5%)
>> # 5/5: gcc (1.2G)
>> dwz : 61.9
>> dwz : 61.9 (99.9%)
>> dwz-O2_lto : 58.5 (94.5%)
>> dwz-O3_lto : 56.6 (91.3%)
>> dwz-O3_lto_pgo : 54.1 (87.4%)
>> dwz-O3_lto_pgo_xxhash : 51.7 (83.4%)
>>
>> So as seen, using -O3 really help, one gets a bigger binary, but as dwz is small
>> it's negligible:
>>
>> bloaty dwz-O3_lto -- dwz-O2_lto
>> FILE SIZE VM SIZE
>> -------------- --------------
>> +28% +50.3Ki [ = ] 0 .debug_loclists
>> +18% +25.3Ki +18% +25.3Ki .text
>> +12% +24.6Ki [ = ] 0 .debug_info
>> +16% +17.3Ki [ = ] 0 .debug_line
>> +31% +6.19Ki [ = ] 0 .debug_rnglists
>> +11% +689 [ = ] 0 .debug_abbrev
>> +7.1% +633 [ = ] 0 .strtab
>> +5.5% +504 +5.5% +504 .eh_frame
>> +1.3% +453 [ = ] 0 .debug_str
>> +0.8% +375 +0.8% +375 .rodata
>> +2.8% +336 [ = ] 0 .symtab
>> +11% +64 [ = ] 0 .debug_aranges
>> +4.2% +64 +4.4% +64 .eh_frame_hdr
>> [ = ] 0 +1.8% +32 .bss
>> -3.1% -21 -3.1% -21 [LOAD #2 [RX]]
>> -61.0% -2.20Ki [ = ] 0 [Unmapped]
>> +16% +124Ki +13% +26.2Ki TOTAL
>>
>> Then, PGO also helps significantly. And finally, using xxhash one can get 5-10% percent
>> improvement.
>>
>> For now I'm suggesting using -O3 and PGO for our openSUSE package:
>> https://build.opensuse.org/request/show/942235
>>
>> Upstream questions I have:
>> - What about changing -O2 with -O3 by default?
>
> Did you test that without -flto? If it still gets a ~5% speedup then I
Yep:
# 1/5: sysdig (60M)
dwz_O2 : 9.7
dwz_O2_xxhash : 8.5 (87.7%)
# 2/5: rtags (58M)
dwz_O2 : 17.6
dwz_O2_xxhash : 15.8 (89.5%)
# 3/5: libetonyek (91M)
dwz_O2 : 10.8
dwz_O2_xxhash : 9.4 (86.7%)
# 4/5: krita (685M)
dwz_O2 : 96.0
dwz_O2_xxhash : 85.6 (89.1%)
# 5/5: gcc (1.2G)
dwz_O2 : 58.6
dwz_O2_xxhash : 54.1 (92.4%)
> like that idea. Or maybe we should also include -flto by default?
Well, it's probably something that can be decided by distributions. Maybe, we can
add a default dwz.spec file?
>
>> - Are you interested in the xxhash patch? Do you want it as a conditional build
>> or may I replace the currently existing hash function?
>
> I think it is best to simply replace the existing hash function
> instead of making it a conditional thing.
Fine, I'm going to prepare a patch.
>
> Does it rely on having the libxxhash dynamic library available or
> would we simply embed a copy (replacing the hashtab.[ch] files)?
I would not do that as it may become obsolete quite fast. I would rather use a standard
shared library (similarly to libelf).
Martin
>
> Cheers,
>
> Mark
>
^ permalink raw reply [flat|nested] 6+ messages in thread
end of thread, other threads:[~2022-01-05 8:01 UTC | newest]
Thread overview: 6+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-01-01 0:00 [Highlight] Performance improvements Tom de Vries
2019-01-01 0:00 ` Martin Liška
2019-01-01 0:00 ` Tom de Vries
2021-12-23 11:57 ` Martin Liška
2022-01-03 22:06 ` Mark Wielaard
2022-01-05 8:01 ` Martin Liška
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).