* about glibc performance
@ 2019-03-22 3:20 Jimmie
2019-03-22 3:38 ` Carlos O'Donell
2019-03-22 21:39 ` Eric Wong
0 siblings, 2 replies; 21+ messages in thread
From: Jimmie @ 2019-03-22 3:20 UTC (permalink / raw)
To: libc-help
Hi,
For serveral days, I did some test about the memory performance of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results indicate that glibc is more efficient then tcmalloc.
generally, people think tcmalloc is efficient than glibc 2.3, but I use glibc 2.17, so I wonder if glibc 2.17 did some improvement on memory performance.
looking forward to your reply, thank you.
--
Jimmie
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: about glibc performance
2019-03-22 3:20 about glibc performance Jimmie
@ 2019-03-22 3:38 ` Carlos O'Donell
2019-03-22 4:22 ` Jimmie
2019-03-22 13:56 ` Siddhesh Poyarekar
2019-03-22 21:39 ` Eric Wong
1 sibling, 2 replies; 21+ messages in thread
From: Carlos O'Donell @ 2019-03-22 3:38 UTC (permalink / raw)
To: Jimmie, libc-help
On 3/21/19 11:20 PM, Jimmie wrote:
> Hi, For serveral days, I did some test about the memory performance
> of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results
> indicate that glibc is more efficient then tcmalloc. generally,
> people think tcmalloc is efficient than glibc 2.3, but I use glibc
> 2.17, so I wonder if glibc 2.17 did some improvement on memory
> performance. looking forward to your reply, thank you.
There were no changes in 2.17 which improved malloc performance.
In 2.23 we fixed a malloc issue with the free list becoming cyclic,
causing decreased performance (due to increased contention).
That will improve performance.
In 2.26 we added a per-thread cache to malloc, so that should
also improve performance.
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re:Re: about glibc performance
2019-03-22 3:38 ` Carlos O'Donell
@ 2019-03-22 4:22 ` Jimmie
2019-03-22 12:21 ` Florian Weimer
2019-03-22 13:56 ` Siddhesh Poyarekar
1 sibling, 1 reply; 21+ messages in thread
From: Jimmie @ 2019-03-22 4:22 UTC (permalink / raw)
To: Carlos O'Donell; +Cc: libc-help
thank you, but it makes me feel uncertain that why google's test report indicates that their tcmalloc is much better than glibc, does it based on the environment my program runs? by the way, it is running in 'Red Hat 4.8.4, x86_64'.<br/>It's been a great benefit to me if you can give me some advise, O(∩_∩)O.
At 2019-03-22 11:38:38, "Carlos O'Donell" <codonell@redhat.com> wrote:
>On 3/21/19 11:20 PM, Jimmie wrote:
>> Hi, For serveral days, I did some test about the memory performance
>> of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results
>> indicate that glibc is more efficient then tcmalloc. generally,
>> people think tcmalloc is efficient than glibc 2.3, but I use glibc
>> 2.17, so I wonder if glibc 2.17 did some improvement on memory
>> performance. looking forward to your reply, thank you.
>
>There were no changes in 2.17 which improved malloc performance.
>
>In 2.23 we fixed a malloc issue with the free list becoming cyclic,
>causing decreased performance (due to increased contention).
>That will improve performance.
>
>In 2.26 we added a per-thread cache to malloc, so that should
>also improve performance.
>
>--
>Cheers,
>Carlos.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: about glibc performance
2019-03-22 4:22 ` Jimmie
@ 2019-03-22 12:21 ` Florian Weimer
0 siblings, 0 replies; 21+ messages in thread
From: Florian Weimer @ 2019-03-22 12:21 UTC (permalink / raw)
To: Jimmie; +Cc: Carlos O'Donell, libc-help
* Jimmie:
> thank you, but it makes me feel uncertain that why google's test
> report indicates that their tcmalloc is much better than glibc, does
> it based on the environment my program runs?
It depends on the environment and the application and its application
profile. You may also have looked at outdated benchmark results.
Subsequent changes in tcmalloc heuristics could have made tcmalloc
perform worse in this particular benchmark.
In general, very few people report performance issues that prompt them
to switch from tcmalloc to glibc malloc because if tcmalloc does not
perform well for them, they will not switch away from glibc malloc in
the first place.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: about glibc performance
2019-03-22 3:38 ` Carlos O'Donell
2019-03-22 4:22 ` Jimmie
@ 2019-03-22 13:56 ` Siddhesh Poyarekar
2019-03-22 14:53 ` Carlos O'Donell
` (2 more replies)
1 sibling, 3 replies; 21+ messages in thread
From: Siddhesh Poyarekar @ 2019-03-22 13:56 UTC (permalink / raw)
To: Carlos O'Donell; +Cc: Jimmie, libc-help
On Fri, 22 Mar 2019 at 09:08, Carlos O'Donell <codonell@redhat.com> wrote:
>
> On 3/21/19 11:20 PM, Jimmie wrote:
> > Hi, For serveral days, I did some test about the memory performance
> > of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results
> > indicate that glibc is more efficient then tcmalloc. generally,
> > people think tcmalloc is efficient than glibc 2.3, but I use glibc
> > 2.17, so I wonder if glibc 2.17 did some improvement on memory
> > performance. looking forward to your reply, thank you.
>
> There were no changes in 2.17 which improved malloc performance.
Actually, there were performance improvements to malloc between 2.3
and 2.17, primarily the per-thread allocator that greatly reduced
contention for multi-threaded applications. I've argued in the past
that the per-thread allocator should bring performance of a number of
applications on par if not better than tcmalloc/jemalloc, but I never
did a formal run and so never wrote a formal rebuttal of the tcmalloc
claims. If you've done formal tests, please do publish them!
Siddhesh
--
https://siddhesh.in
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: about glibc performance
2019-03-22 13:56 ` Siddhesh Poyarekar
@ 2019-03-22 14:53 ` Carlos O'Donell
2019-03-22 15:18 ` Patrick McGehearty
2019-03-22 15:49 ` Paul Pluzhnikov via libc-help
2019-03-25 7:41 ` Jimmie
2 siblings, 1 reply; 21+ messages in thread
From: Carlos O'Donell @ 2019-03-22 14:53 UTC (permalink / raw)
To: Siddhesh Poyarekar; +Cc: Jimmie, libc-help
On 3/22/19 9:55 AM, Siddhesh Poyarekar wrote:
> On Fri, 22 Mar 2019 at 09:08, Carlos O'Donell <codonell@redhat.com> wrote:
>>
>> On 3/21/19 11:20 PM, Jimmie wrote:
>>> Hi, For serveral days, I did some test about the memory performance
>>> of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results
>>> indicate that glibc is more efficient then tcmalloc. generally,
>>> people think tcmalloc is efficient than glibc 2.3, but I use glibc
>>> 2.17, so I wonder if glibc 2.17 did some improvement on memory
>>> performance. looking forward to your reply, thank you.
>>
>> There were no changes in 2.17 which improved malloc performance.
>
> Actually, there were performance improvements to malloc between 2.3
> and 2.17, primarily the per-thread allocator that greatly reduced
> contention for multi-threaded applications. I've argued in the past
> that the per-thread allocator should bring performance of a number of
> applications on par if not better than tcmalloc/jemalloc, but I never
> did a formal run and so never wrote a formal rebuttal of the tcmalloc
> claims. If you've done formal tests, please do publish them!
Oh, right, certainly in 2.15 we switched on per-thread allocators!
I forgot all about that. And from 2.10 to 2.15 it could have been on
with --enalbe-experimental-allocator.
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: about glibc performance
2019-03-22 14:53 ` Carlos O'Donell
@ 2019-03-22 15:18 ` Patrick McGehearty
2019-03-22 16:15 ` Siddhesh Poyarekar
0 siblings, 1 reply; 21+ messages in thread
From: Patrick McGehearty @ 2019-03-22 15:18 UTC (permalink / raw)
To: libc-help
There have been about 80 patches which modify files in the
malloc directory of the glibc src tree between 2.17 and 2.28.
2.17 was first available around 2013 and 2.28 in 2018.
Many fixes are minor, some are not. They include fixing race
conditions, fixing boundary condition computations, and
some are performance improvements.
For the performance improvements, in addition to offering
the thread-based allocation areas, glibc malloc has also
improved the code to decide when to return memory to the
system, reducing the thrash-effect of frequent small and
medium malloc/free combinations. That is a significant benefit
for C++ and Java programs that are written in a style
which does frequent "new/allocate and free" operations.
At least one of the SPEC cpu2017 benchmarks benefits
from this improvement.
Whenever comparing malloc replacements, be sure you
have recent versions of the allocators as there has
been considerable activity in improvement of several
of these allocators over the last 5-10 years.
On 3/22/2019 9:53 AM, Carlos O'Donell wrote:
> On 3/22/19 9:55 AM, Siddhesh Poyarekar wrote:
>> On Fri, 22 Mar 2019 at 09:08, Carlos O'Donell <codonell@redhat.com>
>> wrote:
>>>
>>> On 3/21/19 11:20 PM, Jimmie wrote:
>>>> Hi, For serveral days, I did some test about the memory performance
>>>> of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results
>>>> indicate that glibc is more efficient then tcmalloc. generally,
>>>> people think tcmalloc is efficient than glibc 2.3, but I use glibc
>>>> 2.17, so I wonder if glibc 2.17 did some improvement on memory
>>>> performance. looking forward to your reply, thank you.
>>>
>>> There were no changes in 2.17 which improved malloc performance.
>>
>> Actually, there were performance improvements to malloc between 2.3
>> and 2.17, primarily the per-thread allocator that greatly reduced
>> contention for multi-threaded applications. I've argued in the past
>> that the per-thread allocator should bring performance of a number of
>> applications on par if not better than tcmalloc/jemalloc, but I never
>> did a formal run and so never wrote a formal rebuttal of the tcmalloc
>> claims. If you've done formal tests, please do publish them!
>
> Oh, right, certainly in 2.15 we switched on per-thread allocators!
> I forgot all about that. And from 2.10 to 2.15 it could have been on
> with --enalbe-experimental-allocator.
>
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: about glibc performance
2019-03-22 13:56 ` Siddhesh Poyarekar
2019-03-22 14:53 ` Carlos O'Donell
@ 2019-03-22 15:49 ` Paul Pluzhnikov via libc-help
2019-03-22 16:13 ` Siddhesh Poyarekar
2019-03-25 7:41 ` Jimmie
2 siblings, 1 reply; 21+ messages in thread
From: Paul Pluzhnikov via libc-help @ 2019-03-22 15:49 UTC (permalink / raw)
To: Siddhesh Poyarekar; +Cc: Carlos O'Donell, Jimmie, libc-help
On Fri, Mar 22, 2019 at 6:56 AM Siddhesh Poyarekar
<siddhesh.poyarekar@gmail.com> wrote:
> I've argued in the past
> that the per-thread allocator should bring performance of a number of
> applications on par if not better than tcmalloc/jemalloc, but I never
> did a formal run and so never wrote a formal rebuttal of the tcmalloc
> claims.
The claims you didn't formally rebut are probably these:
http://goog-perftools.sourceforge.net/doc/tcmalloc.html
Note that that document is from 2005, and talks about benchmarking
against GLIBC 2.3 on Intel P4 processors in 32-bit mode.
It's not like TCMalloc has stood still for the last 15 years, but I
don't believe there have been any recent open-source releases of it.
--
Paul Pluzhnikov
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: about glibc performance
2019-03-22 15:49 ` Paul Pluzhnikov via libc-help
@ 2019-03-22 16:13 ` Siddhesh Poyarekar
0 siblings, 0 replies; 21+ messages in thread
From: Siddhesh Poyarekar @ 2019-03-22 16:13 UTC (permalink / raw)
To: Paul Pluzhnikov; +Cc: Carlos O'Donell, Jimmie, libc-help
On Fri, 22 Mar 2019 at 21:18, Paul Pluzhnikov <ppluzhnikov@google.com> wrote:
> The claims you didn't formally rebut are probably these:
> http://goog-perftools.sourceforge.net/doc/tcmalloc.html
I know, IIRC at least 5 people had pointed me to that link back in the
day to tell me that I was wasting my time working on glibc and that I
should be working on the clouds ;)
> Note that that document is from 2005, and talks about benchmarking
> against GLIBC 2.3 on Intel P4 processors in 32-bit mode.
>
> It's not like TCMalloc has stood still for the last 15 years, but I
> don't believe there have been any recent open-source releases of it.
Agreed, I didn't mean to imply otherwise. I mainly want to point out
that the assertion in that blog post is quite outdated and that the
comparison is a lot closer today. Not only that, there will be
massive variations based on use cases and such evaluations may only
end up being of academic interest.
Siddhesh
--
http://siddhesh.in
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: about glibc performance
2019-03-22 15:18 ` Patrick McGehearty
@ 2019-03-22 16:15 ` Siddhesh Poyarekar
0 siblings, 0 replies; 21+ messages in thread
From: Siddhesh Poyarekar @ 2019-03-22 16:15 UTC (permalink / raw)
To: Patrick McGehearty; +Cc: libc-help
On Fri, 22 Mar 2019 at 20:48, Patrick McGehearty
<patrick.mcgehearty@oracle.com> wrote:
> Whenever comparing malloc replacements, be sure you
> have recent versions of the allocators as there has
> been considerable activity in improvement of several
> of these allocators over the last 5-10 years.
... and also look at various tuning options[1] in the allocator that
may potentially have significant effects on performance.
Siddhesh
[1] https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html#Memory-Allocation-Tunables
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: about glibc performance
2019-03-22 3:20 about glibc performance Jimmie
2019-03-22 3:38 ` Carlos O'Donell
@ 2019-03-22 21:39 ` Eric Wong
2019-03-22 22:20 ` Konstantin Kharlamov
2019-03-23 2:05 ` Carlos O'Donell
1 sibling, 2 replies; 21+ messages in thread
From: Eric Wong @ 2019-03-22 21:39 UTC (permalink / raw)
To: Jimmie; +Cc: libc-help
Jimmie <zpjjimmie@163.com> wrote:
> Hi,
> For serveral days, I did some test about the memory performance of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results indicate that glibc is more efficient then tcmalloc.
> generally, people think tcmalloc is efficient than glibc 2.3, but I use glibc 2.17, so I wonder if glibc 2.17 did some improvement on memory performance.
> looking forward to your reply, thank you.
You may also want to check out my proof-of-concept wfcqueue patch
which optimizes message passing:
https://public-inbox.org/libc-alpha/20180731084936.g4yw6wnvt677miti@dcvr/
Unfortunately, integrating wfcqueue/URCU into glibc will take
much effort :<
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: about glibc performance
2019-03-22 21:39 ` Eric Wong
@ 2019-03-22 22:20 ` Konstantin Kharlamov
2019-03-22 22:38 ` Eric Wong
2019-03-23 2:05 ` Carlos O'Donell
1 sibling, 1 reply; 21+ messages in thread
From: Konstantin Kharlamov @ 2019-03-22 22:20 UTC (permalink / raw)
To: Eric Wong; +Cc: Jimmie, libc-help
В Сб, мар 23, 2019 at 12:38 ДП (AM), Eric Wong
<normalperson@yhbt.net> написал:
> Jimmie <zpjjimmie@163.com> wrote:
>> Hi,
>> For serveral days, I did some test about the memory performance of
>> glibc(2.17) and tcmalloc(gperformance 2.7), and my test results
>> indicate that glibc is more efficient then tcmalloc.
>> generally, people think tcmalloc is efficient than glibc 2.3, but I
>> use glibc 2.17, so I wonder if glibc 2.17 did some improvement on
>> memory performance.
>> looking forward to your reply, thank you.
>
> You may also want to check out my proof-of-concept wfcqueue patch
> which optimizes message passing:
>
> https://public-inbox.org/libc-alpha/20180731084936.g4yw6wnvt677miti@dcvr/
>
> Unfortunately, integrating wfcqueue/URCU into glibc will take
> much effort :<
Also in future, I think, "restartable sequences" should improve malloc
performance, right? I'm wondering, when will they appear…
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: about glibc performance
2019-03-22 22:20 ` Konstantin Kharlamov
@ 2019-03-22 22:38 ` Eric Wong
2019-03-23 1:49 ` Carlos O'Donell
0 siblings, 1 reply; 21+ messages in thread
From: Eric Wong @ 2019-03-22 22:38 UTC (permalink / raw)
To: Konstantin Kharlamov; +Cc: Jimmie, libc-help
Konstantin Kharlamov <hi-angel@yandex.ru> wrote:
> Also in future, I think, "restartable sequences" should improve malloc
> performance, right? I'm wondering, when will they appearâ¦
Yes, they're complementary ideas. Looks like
rseq is being discussed in libc-alpha, lately.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: about glibc performance
2019-03-22 22:38 ` Eric Wong
@ 2019-03-23 1:49 ` Carlos O'Donell
0 siblings, 0 replies; 21+ messages in thread
From: Carlos O'Donell @ 2019-03-23 1:49 UTC (permalink / raw)
To: Eric Wong, Konstantin Kharlamov; +Cc: Jimmie, libc-help
On 3/22/19 6:38 PM, Eric Wong wrote:
> Konstantin Kharlamov <hi-angel@yandex.ru> wrote:
>> Also in future, I think, "restartable sequences" should improve malloc
>> performance, right? I'm wondering, when will they appearâ¦
>
> Yes, they're complementary ideas. Looks like
> rseq is being discussed in libc-alpha, lately.
I just did a review of the rseq patches, and I think the
biggest sticking point is the registration ref-count
interface, but Mathieu has proposed something even simpler
that should work, and now we'll try to get buy in from
the various senior reviewers.
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: about glibc performance
2019-03-22 21:39 ` Eric Wong
2019-03-22 22:20 ` Konstantin Kharlamov
@ 2019-03-23 2:05 ` Carlos O'Donell
1 sibling, 0 replies; 21+ messages in thread
From: Carlos O'Donell @ 2019-03-23 2:05 UTC (permalink / raw)
To: Eric Wong, Jimmie; +Cc: libc-help
On 3/22/19 5:38 PM, Eric Wong wrote:
> Jimmie <zpjjimmie@163.com> wrote:
>> Hi, For serveral days, I did some test about the memory
>> performance of glibc(2.17) and tcmalloc(gperformance 2.7), and my
>> test results indicate that glibc is more efficient then tcmalloc.
>> generally, people think tcmalloc is efficient than glibc 2.3, but I
>> use glibc 2.17, so I wonder if glibc 2.17 did some improvement on
>> memory performance. looking forward to your reply, thank you.
>
> You may also want to check out my proof-of-concept wfcqueue patch
> which optimizes message passing:
>
> https://public-inbox.org/libc-alpha/20180731084936.g4yw6wnvt677miti@dcvr/
>
> Unfortunately, integrating wfcqueue/URCU into glibc will take much
> effort :<
FAOD I really like the direction these patches take glibc's malloc.
I just don't have time to push your idea to completion. I know it's
a lot of effort to push novel ideas forward, but it touches
pieces of code which are used by a lot of programs. The best tooling
we have today is malloc trace/simulation to capture and compare
workloads before and after:
https://pagure.io/glibc-malloc-trace-utils
So the hard part is not the changes to the code, but it's in doing
the before/after performance comparison for a variety of workloads
and capturing those workloads with the tracer so others can look
at and run them. We don't even have a good place to store large
workloads (we're talking to overseers to see if we can enable some
git-annex support on sourceware, git lfs is still too new).
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re:Re: about glibc performance
2019-03-22 13:56 ` Siddhesh Poyarekar
2019-03-22 14:53 ` Carlos O'Donell
2019-03-22 15:49 ` Paul Pluzhnikov via libc-help
@ 2019-03-25 7:41 ` Jimmie
2019-03-25 9:37 ` Konstantin Kharlamov
2019-03-25 9:47 ` Siddhesh Poyarekar
2 siblings, 2 replies; 21+ messages in thread
From: Jimmie @ 2019-03-25 7:41 UTC (permalink / raw)
To: Siddhesh Poyarekar; +Cc: Carlos O'Donell, libc-help
It seems like that I can't send attachment to libc-help. so I simply describe my test results.<br/>malloc and free 10000 times in per-thread, the datas is below(left column represent memsize per malloc. and the other column represent the cost time it uses, )<br/> glibc tcmalloc tbbmalloc<br/>1 213us 821us 168us<br/>10 215 820 175<br/>50 208 832 186<br/>100 204 808 194<br/>500 250 852 183<br/>1000 428 859 180<br/>5000 414 892 190<br/>8128 389 880 194<br/>8129 388 891 525<br/>10000 392 838 554<br/>100000 332 846 562<br/>262144 321 852 546<br/>262145 312 2045 600<br/>1000000 312 2126 555<br/>10000000 331 4645 1228<br/><br/><br/>malloc and free in 4 threads, 5000 times per-thread<br/> glibc tcmalloc tbbmalloc<br/>1 284 1629 186<br/>10 297 1093 143<br/>48 285 1252 151<br/>49 282 552 150<br/>100 283 556 157<br/>1000 322 510 168<br/>5000 313 529 162<br/>8128 312 528 173<br/>8129 332 597 1350<br/>10000 324 589 1425<br/>100000 316 535 1428<br/>262144 319 534 1524<br/>262145 328 32596 1545<br/>1000000 321 27106 1330<br/>10000000 323 34590 14141<br/><br/>and I can provide my test code if necessary.<br/><br/>for more convenient and effective, maybe you can use the benchmark from here, https://github.com/gperftools/gperftools/tree/master/benchmark.
At 2019-03-22 21:55:47, "Siddhesh Poyarekar" <siddhesh.poyarekar@gmail.com> wrote:
>On Fri, 22 Mar 2019 at 09:08, Carlos O'Donell <codonell@redhat.com> wrote:
>>
>> On 3/21/19 11:20 PM, Jimmie wrote:
>> > Hi, For serveral days, I did some test about the memory performance
>> > of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results
>> > indicate that glibc is more efficient then tcmalloc. generally,
>> > people think tcmalloc is efficient than glibc 2.3, but I use glibc
>> > 2.17, so I wonder if glibc 2.17 did some improvement on memory
>> > performance. looking forward to your reply, thank you.
>>
>> There were no changes in 2.17 which improved malloc performance.
>
>Actually, there were performance improvements to malloc between 2.3
>and 2.17, primarily the per-thread allocator that greatly reduced
>contention for multi-threaded applications. I've argued in the past
>that the per-thread allocator should bring performance of a number of
>applications on par if not better than tcmalloc/jemalloc, but I never
>did a formal run and so never wrote a formal rebuttal of the tcmalloc
>claims. If you've done formal tests, please do publish them!
>
>Siddhesh
>--
>https://siddhesh.in
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re:Re: about glibc performance
2019-03-25 7:41 ` Jimmie
@ 2019-03-25 9:37 ` Konstantin Kharlamov
2019-03-25 9:47 ` Siddhesh Poyarekar
1 sibling, 0 replies; 21+ messages in thread
From: Konstantin Kharlamov @ 2019-03-25 9:37 UTC (permalink / raw)
To: Jimmie; +Cc: Siddhesh Poyarekar, Carlos O'Donell, libc-help
On Пн, Mar 25, 2019 at 10:37:11, Jimmie <zpjjimmie@163.com> wrote:
> It seems like that I can't send attachment to libc-help. so I simply
> describe my test results.<br/>malloc and free 10000 times in
> per-thread, the datas is below(left column represent memsize per
> malloc. and the other column represent the cost time it uses, )<br/>
> glibc tcmalloc tbbmalloc<br/>1 213us
> 821us 168us<br/>10 215 820 175<br/>50
> 208 832 186<br/>100 204 808
> 194<br/>500 250 852 183<br/>1000
> 428 859 180<br/>5000 414 892
> 190<br/>8128 389 880 194<br/>8129 388
> 891 525<br/>10000 392 838
> 554<br/>100000 332 846 562<br/>262144 321
> 852 546<br/>262145 312
> 2045 600<br/>1000000 312 2126 555<br/>10000000 331
> 4645 1228<br/><br/><br/>malloc and free in 4 threads, 5000
> times per-thread<br/> glibc tcmalloc tbbmalloc<br/>1
> 284 1629 186<br/>10 297
> 1093 143<br/>48 285 1252 151<br/>49
> 282 552 150<br/>100 283 556
> 157<br/>1000 322 510 168<br/>5000 313
> 529 162<br/>8128 312 528 173<br/>8129
> 332 597 1350<br/>10000 324 589
> 1425<br/>100000 316 535 1428<br/>262144 319
> 534 1524<br/>262145 328
> 32596 1545<br/>1000000 321 27106 1330<br/>10000000
> 323 34590 14141<br/><br/>and I can provide my test code if
> necessary.<br/><br/>for more convenient and effective, maybe you can
> use the benchmark from here,
> https://github.com/gperftools/gperftools/tree/master/benchmark.
Oops, looks like your email client doesn't handle plain text well. But
nothing that a few Emacs regexps can't fix :)
Here're fixed benchmark results.
size glibc tcmalloc tbbmalloc
1 213us 821us 168us
10 215 820 175
50 208 832 186
100 204 808 194
500 250 852 183
1000 428 859 180
5000 414 892 190
8128 389 880 194
8129 388 891 525
10000 392 838 554
100000 332 846 562
262144 321 852 546
262145 312 2045 600
1000000 312 2126 555
10000000 331 4645 1228
malloc and free in 4 threads, 5000 times per-thread
size glibc tcmalloc tbbmalloc
1 284 1629 186
10 297 1093 143
48 285 1252 151
49 282 552 150
100 283 556 157
1000 322 510 168
5000 313 529 162
8128 312 528 173
8129 332 597 1350
10000 324 589 1425
100000 316 535 1428
262144 319 534 1524
262145 328 32596 1545
1000000 321 27106 1330
10000000 323 34590 14141
Well, looks like tcmalloc lags far behind the glibc, that's cool (well
for us at least :).
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Re: about glibc performance
2019-03-25 7:41 ` Jimmie
2019-03-25 9:37 ` Konstantin Kharlamov
@ 2019-03-25 9:47 ` Siddhesh Poyarekar
[not found] ` <1e77c218.6a0c.169b7c2ab4c.Coremail.zpjjimmie@163.com>
2019-03-27 21:22 ` Carlos O'Donell
1 sibling, 2 replies; 21+ messages in thread
From: Siddhesh Poyarekar @ 2019-03-25 9:47 UTC (permalink / raw)
To: Jimmie; +Cc: Carlos O'Donell, libc-help
On Mon, 25 Mar 2019 at 13:11, Jimmie <zpjjimmie@163.com> wrote:
>
> It seems like that I can't send attachment to libc-help. so I simply describe my test results.<br/>malloc and free 10000 times in per-thread, the datas is below(left column represent memsize per malloc. and the other column represent the cost time it uses, )
Thank you for sharing your results. While the results are very
tempting to share (because of my obvious bias as a glibc developer),
simply allocating and freeing repeatedly in per-thread may not be a
sufficient enough test. This does show that glibc does significantly
better than tcmalloc for same size reallocations, but not much else.
That is unless you're baking in a way to mix up the sizes and
allocations that mimic some known real world workload(s).
If you're interested in pursuing this further, I would recommend
profiling a program like firefox or libreoffice to find
malloc/calloc/realloc/free calls and then mimicing that workload
somehow. That would be a much nicer benchmark to do this kind of
comparison.
Thanks,
Siddhesh
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: Re: Re: about glibc performance
[not found] ` <1e77c218.6a0c.169b7c2ab4c.Coremail.zpjjimmie@163.com>
@ 2019-03-26 3:02 ` Siddhesh Poyarekar
2019-03-26 3:44 ` Jimmie
0 siblings, 1 reply; 21+ messages in thread
From: Siddhesh Poyarekar @ 2019-03-26 3:02 UTC (permalink / raw)
To: Jimmie; +Cc: Carlos O'Donell, libc-help
That's really cool, thanks for doing this! Is this the latest glibc
or glibc-2.17? Also, is tcmalloc the latest one too?
As for glibc improvements, there are spikes in
bench_fastpath_stack_simple(8192) and
bench_fastpath_rnd_dependent(8192) that may be worth looking into.
Siddhesh
On Tue, 26 Mar 2019 at 07:50, Jimmie <zpjjimmie@163.com> wrote:
>
> I agree that it's not a sufficient enough test, so I also use the benchmark from https://github.com/gperftools/gperftools/tree/master/benchmark, which is provided by google.
> I also had the test result and attach to the attachment. You can download it and maybe you should open it with nodepad++ or some else to typeset nicely.
>
> Thanks.
> Jimmie
>
>
> At 2019-03-25 17:46:51, "Siddhesh Poyarekar" <siddhesh.poyarekar@gmail.com> wrote:
> >On Mon, 25 Mar 2019 at 13:11, Jimmie <zpjjimmie@163.com> wrote:
> >>
> >> It seems like that I can't send attachment to libc-help. so I simply describe my test results.<br/>malloc and free 10000 times in per-thread, the datas is below(left column represent memsize per malloc. and the other column represent the cost time it uses, )
> >
> >Thank you for sharing your results. While the results are very
> >tempting to share (because of my obvious bias as a glibc developer),
> >simply allocating and freeing repeatedly in per-thread may not be a
> >sufficient enough test. This does show that glibc does significantly
> >better than tcmalloc for same size reallocations, but not much else.
> >That is unless you're baking in a way to mix up the sizes and
> >allocations that mimic some known real world workload(s).
> >
> >If you're interested in pursuing this further, I would recommend
> >profiling a program like firefox or libreoffice to find
> >malloc/calloc/realloc/free calls and then mimicing that workload
> >somehow. That would be a much nicer benchmark to do this kind of
> >comparison.
> >
> >Thanks,
> >Siddhesh
>
>
>
>
>
>
> --
> Jimmie
--
http://siddhesh.in
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re:Re: Re: Re: about glibc performance
2019-03-26 3:02 ` Siddhesh Poyarekar
@ 2019-03-26 3:44 ` Jimmie
0 siblings, 0 replies; 21+ messages in thread
From: Jimmie @ 2019-03-26 3:44 UTC (permalink / raw)
To: Siddhesh Poyarekar; +Cc: Carlos O'Donell, libc-help
glibc 2.17, and tcmalloc is the latest(gperftools 2.7 which is updated on 30 Apr 2018) .
And here is my environment:
redhat 4.8.5-16
x86_64
And tcmalloc is only better in bench_fastpath_stack_simple(8192).
Jimmie
At 2019-03-26 11:02:27, "Siddhesh Poyarekar" <siddhesh.poyarekar@gmail.com> wrote:
>That's really cool, thanks for doing this! Is this the latest glibc
>or glibc-2.17? Also, is tcmalloc the latest one too?
>
>As for glibc improvements, there are spikes in
>bench_fastpath_stack_simple(8192) and
>bench_fastpath_rnd_dependent(8192) that may be worth looking into.
>
>Siddhesh
>
>On Tue, 26 Mar 2019 at 07:50, Jimmie <zpjjimmie@163.com> wrote:
>>
>> I agree that it's not a sufficient enough test, so I also use the benchmark from https://github.com/gperftools/gperftools/tree/master/benchmark, which is provided by google.
>> I also had the test result and attach to the attachment. You can download it and maybe you should open it with nodepad++ or some else to typeset nicely.
>>
>> Thanks.
>> Jimmie
>>
>>
>> At 2019-03-25 17:46:51, "Siddhesh Poyarekar" <siddhesh.poyarekar@gmail.com> wrote:
>> >On Mon, 25 Mar 2019 at 13:11, Jimmie <zpjjimmie@163.com> wrote:
>> >>
>> >> It seems like that I can't send attachment to libc-help. so I simply describe my test results.<br/>malloc and free 10000 times in per-thread, the datas is below(left column represent memsize per malloc. and the other column represent the cost time it uses, )
>> >
>> >Thank you for sharing your results. While the results are very
>> >tempting to share (because of my obvious bias as a glibc developer),
>> >simply allocating and freeing repeatedly in per-thread may not be a
>> >sufficient enough test. This does show that glibc does significantly
>> >better than tcmalloc for same size reallocations, but not much else.
>> >That is unless you're baking in a way to mix up the sizes and
>> >allocations that mimic some known real world workload(s).
>> >
>> >If you're interested in pursuing this further, I would recommend
>> >profiling a program like firefox or libreoffice to find
>> >malloc/calloc/realloc/free calls and then mimicing that workload
>> >somehow. That would be a much nicer benchmark to do this kind of
>> >comparison.
>> >
>> >Thanks,
>> >Siddhesh
>>
>>
>>
>>
>>
>>
>> --
>> Jimmie
>
>
>
>--
>http://siddhesh.in
^ permalink raw reply [flat|nested] 21+ messages in thread
* Re: about glibc performance
2019-03-25 9:47 ` Siddhesh Poyarekar
[not found] ` <1e77c218.6a0c.169b7c2ab4c.Coremail.zpjjimmie@163.com>
@ 2019-03-27 21:22 ` Carlos O'Donell
1 sibling, 0 replies; 21+ messages in thread
From: Carlos O'Donell @ 2019-03-27 21:22 UTC (permalink / raw)
To: Siddhesh Poyarekar, Jimmie; +Cc: libc-help
On 3/25/19 5:46 AM, Siddhesh Poyarekar wrote:
> On Mon, 25 Mar 2019 at 13:11, Jimmie <zpjjimmie@163.com> wrote:
>>
>> It seems like that I can't send attachment to libc-help. so I simply describe my test results.<br/>malloc and free 10000 times in per-thread, the datas is below(left column represent memsize per malloc. and the other column represent the cost time it uses, )
>
> Thank you for sharing your results. While the results are very
> tempting to share (because of my obvious bias as a glibc developer),
> simply allocating and freeing repeatedly in per-thread may not be a
> sufficient enough test. This does show that glibc does significantly
> better than tcmalloc for same size reallocations, but not much else.
> That is unless you're baking in a way to mix up the sizes and
> allocations that mimic some known real world workload(s).
>
> If you're interested in pursuing this further, I would recommend
> profiling a program like firefox or libreoffice to find
> malloc/calloc/realloc/free calls and then mimicing that workload
> somehow. That would be a much nicer benchmark to do this kind of
> comparison.
I agree.
We need whole-system benchmarking for malloc workloads, and we need
to take into account things like:
* page touch heuristics
* cache hit/miss rates and their distributions
* inter-thread dependencies or lack of them
We have only basic tooling today for this:
https://pagure.io/glibc-malloc-trace-utils
DJ and I just split out the integrated malloc trace into a LD_PRELOAD-able
tracer now, using the same mapped-window thread-safe algorithm for
tracing. I expect we'll be able to now finish the hook deprecation and
replace the trace with this thread-safe LD_PRELOAD-able version.
--
Cheers,
Carlos.
^ permalink raw reply [flat|nested] 21+ messages in thread
end of thread, other threads:[~2019-03-27 21:22 UTC | newest]
Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-22 3:20 about glibc performance Jimmie
2019-03-22 3:38 ` Carlos O'Donell
2019-03-22 4:22 ` Jimmie
2019-03-22 12:21 ` Florian Weimer
2019-03-22 13:56 ` Siddhesh Poyarekar
2019-03-22 14:53 ` Carlos O'Donell
2019-03-22 15:18 ` Patrick McGehearty
2019-03-22 16:15 ` Siddhesh Poyarekar
2019-03-22 15:49 ` Paul Pluzhnikov via libc-help
2019-03-22 16:13 ` Siddhesh Poyarekar
2019-03-25 7:41 ` Jimmie
2019-03-25 9:37 ` Konstantin Kharlamov
2019-03-25 9:47 ` Siddhesh Poyarekar
[not found] ` <1e77c218.6a0c.169b7c2ab4c.Coremail.zpjjimmie@163.com>
2019-03-26 3:02 ` Siddhesh Poyarekar
2019-03-26 3:44 ` Jimmie
2019-03-27 21:22 ` Carlos O'Donell
2019-03-22 21:39 ` Eric Wong
2019-03-22 22:20 ` Konstantin Kharlamov
2019-03-22 22:38 ` Eric Wong
2019-03-23 1:49 ` Carlos O'Donell
2019-03-23 2:05 ` Carlos O'Donell
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).