public inbox for gcc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug other/60828] New: Compile time speedups when using tcmalloc
@ 2014-04-11 20:15 trippels at gcc dot gnu.org
  2014-04-12 11:18 ` [Bug other/60828] " trippels at gcc dot gnu.org
                   ` (2 more replies)
  0 siblings, 3 replies; 4+ messages in thread
From: trippels at gcc dot gnu.org @ 2014-04-11 20:15 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60828

            Bug ID: 60828
           Summary: Compile time speedups when using tcmalloc
           Product: gcc
           Version: 4.9.0
            Status: UNCONFIRMED
          Severity: normal
          Priority: P3
         Component: other
          Assignee: unassigned at gcc dot gnu.org
          Reporter: trippels at gcc dot gnu.org

There are noticeable compile time speedups when one links gcc with
tcmalloc. This happens mostly for C++ programs. Plain C projects
show not much difference. 
Here are the compile times for Firefox an my 4-core machine:

Firefox -O3:
glibc malloc:
 2806.82s user 126.92s system 349% cpu 13:58.37 total    0% speedup
tcmalloc:
 2707.31s user 129.93s system 358% cpu 13:10.61 total  5.7% speedup
jemalloc:
 2708.30s user 175.53s system 354% cpu 13:34.29 total  2.9% speedup

Firefox -flto=4 -O3: 
glibc malloc:
 3241.66s user 155.71s system 316% cpu 17:54.13 total    0% speedup
tcmalloc:
 3140.43s user 164.22s system 323% cpu 17:01.13 total  4.9% speedup
jemalloc:
 3155.74s user 226.63s system 320% cpu 17:35.51 total  1.7% speedup

A simpler example is tramp3d-v4:
glibc malloc:
 % time g++ -w -O3 -march=native tramp3d-v4.cpp
 22.30s user 0.34s system 97% cpu 23.301 total
tcmalloc:
 ~ % time g++ -w -O3 -march=native tramp3d-v4.cpp
 21.36s user 0.30s system 99% cpu 21.659 total    (~7% speedup)

tcmalloc build in heap-profiler shows (number of allocated megabytes.
This includes the space that has since been deallocated):

markus@x4 ~ % pprof --alloc_space --text
/usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.0/cc1 /tmp/mybin.hprof_4474.0010.heap 
Using local file /usr/libexec/gcc/x86_64-pc-linux-gnu/4.9.0/cc1.
Using local file /tmp/mybin.hprof_4474.0010.heap.
Total: 34.3 MB
     7.7  22.6%  22.6%      7.8  22.6% c_common_nodes_and_builtins [clone
.cold.171]
     5.7  16.7%  39.3%      5.7  16.7% tree_ssa_lim
     4.3  12.5%  51.8%     10.8  31.5% cpp_classify_number
     3.8  11.1%  62.9%      5.2  15.1% do_endif [clone .lto_priv.2364]
     2.6   7.5%  70.4%      2.6   7.5% _cpp_pop_context
     2.6   7.5%  77.8%      2.6   7.5% cgraph_add_node_removal_hook
     2.2   6.5%  84.3%      2.2   6.5% __gmp_default_allocate
     1.7   5.1%  89.4%      1.7   5.1% rtx_moveable_p [clone .isra.7] [clone
.lto_priv.5842]
     1.5   4.2%  93.6%      1.7   5.1% add_exit_phis [clone .lto_priv.5880]
     0.7   2.1%  95.7%      0.7   2.1% ix86_target_macros_internal [clone
.lto_priv.7319]
     0.3   0.9%  96.6%      0.3   0.9% init_alias_vars [clone .lto_priv.9038]
     0.3   0.8%  97.4%      0.3   0.8% gimple_fold_builtin
...

And total objects (including deallocated):
Total: 619253 objects
  290259  46.9%  46.9%   290259  46.9% __gmp_default_allocate
   89866  14.5%  61.4%    89866  14.5% rtx_moveable_p [clone .isra.7] [clone
.lto_priv.5842]
   74190  12.0%  73.4%   107769  17.4% cpp_classify_number
   66198  10.7%  84.1%    66243  10.7% do_endif [clone .lto_priv.2364]
   44778   7.2%  91.3%    44778   7.2% _cpp_pop_context
   20931   3.4%  94.7%    20939   3.4% simplify_plus_minus [clone
.lto_priv.5851]
    8642   1.4%  96.1%    11749   1.9% expand_asm_operands [clone
.lto_priv.6838]
    5665   0.9%  97.0%     5801   0.9% c_common_nodes_and_builtins [clone
.cold.171]
    4659   0.8%  97.7%     4659   0.8% merge_classes [clone .part.41] [clone
.lto_priv.3432]
    3659   0.6%  98.3%     3773   0.6% init_alias_vars [clone .lto_priv.9038]
    2541   0.4%  98.7%     2541   0.4% tree_ssa_lim


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug other/60828] Compile time speedups when using tcmalloc
  2014-04-11 20:15 [Bug other/60828] New: Compile time speedups when using tcmalloc trippels at gcc dot gnu.org
@ 2014-04-12 11:18 ` trippels at gcc dot gnu.org
  2014-04-14  8:50 ` rguenth at gcc dot gnu.org
  2014-04-14 12:29 ` trippels at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: trippels at gcc dot gnu.org @ 2014-04-12 11:18 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60828

Markus Trippelsdorf <trippels at gcc dot gnu.org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |WONTFIX

--- Comment #1 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
In the case of clang, that uses heap allocations a lot more than gcc, 
the speedup is almost doubled:

Firefox -O3 (clang):
glibc malloc:
 2305.72s user  60.49s system 331% cpu 11:54.73 total    0% speedup
tcmalloc:
 2201.04s user  78.25s system 355% cpu 10:41.27 total 10.3% speedup
jemalloc:
 2231.16s user 102.52s system 342% cpu 11:21.12 total  4.7% speedup

I will close this issue, because it really should be moved to the glibc
bugzilla.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug other/60828] Compile time speedups when using tcmalloc
  2014-04-11 20:15 [Bug other/60828] New: Compile time speedups when using tcmalloc trippels at gcc dot gnu.org
  2014-04-12 11:18 ` [Bug other/60828] " trippels at gcc dot gnu.org
@ 2014-04-14  8:50 ` rguenth at gcc dot gnu.org
  2014-04-14 12:29 ` trippels at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: rguenth at gcc dot gnu.org @ 2014-04-14  8:50 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60828

--- Comment #2 from Richard Biener <rguenth at gcc dot gnu.org> ---
Well, it's useful to point out the main offenders of malloc/free that might
better use a more suitable allocation strathegy like obstacks for example.

     7.7  22.6%  22.6%      7.8  22.6% c_common_nodes_and_builtins [clone
.cold.171]

where does that allocate?

     5.7  16.7%  39.3%      5.7  16.7% tree_ssa_lim

Yeah, I have some old patches for that ...

     4.3  12.5%  51.8%     10.8  31.5% cpp_classify_number

Not exactly clear where that allocates ...

     3.8  11.1%  62.9%      5.2  15.1% do_endif [clone .lto_priv.2364]

also in libcpp, but uses obstacks already.


^ permalink raw reply	[flat|nested] 4+ messages in thread

* [Bug other/60828] Compile time speedups when using tcmalloc
  2014-04-11 20:15 [Bug other/60828] New: Compile time speedups when using tcmalloc trippels at gcc dot gnu.org
  2014-04-12 11:18 ` [Bug other/60828] " trippels at gcc dot gnu.org
  2014-04-14  8:50 ` rguenth at gcc dot gnu.org
@ 2014-04-14 12:29 ` trippels at gcc dot gnu.org
  2 siblings, 0 replies; 4+ messages in thread
From: trippels at gcc dot gnu.org @ 2014-04-14 12:29 UTC (permalink / raw)
  To: gcc-bugs

http://gcc.gnu.org/bugzilla/show_bug.cgi?id=60828

--- Comment #3 from Markus Trippelsdorf <trippels at gcc dot gnu.org> ---
I've looked at more profiles during the weekend and bitmaps always
showed up on top.
As Honza said on IRC, bitmaps go into obstacks and obstacks weren't
optimized since the 80s. So it looks like tcmalloc just handles
obstacks better than the stock glibc malloc.


^ permalink raw reply	[flat|nested] 4+ messages in thread

end of thread, other threads:[~2014-04-14 12:29 UTC | newest]

Thread overview: 4+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2014-04-11 20:15 [Bug other/60828] New: Compile time speedups when using tcmalloc trippels at gcc dot gnu.org
2014-04-12 11:18 ` [Bug other/60828] " trippels at gcc dot gnu.org
2014-04-14  8:50 ` rguenth at gcc dot gnu.org
2014-04-14 12:29 ` trippels at gcc dot gnu.org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).