about glibc performance

public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed

* about glibc performance
@ 2019-03-22  3:20 Jimmie
  2019-03-22  3:38 ` Carlos O'Donell
  2019-03-22 21:39 ` Eric Wong
  0 siblings, 2 replies; 21+ messages in thread
From: Jimmie @ 2019-03-22  3:20 UTC (permalink / raw)
  To: libc-help

Hi,
For serveral days, I did some test  about the memory performance of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results indicate that glibc is more efficient then tcmalloc.
generally, people think tcmalloc is efficient than glibc 2.3, but I use glibc 2.17, so I wonder if glibc 2.17 did some improvement on memory performance.
looking forward to your reply, thank you.

--

Jimmie

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: about glibc performance
  2019-03-22  3:20 about glibc performance Jimmie
@ 2019-03-22  3:38 ` Carlos O'Donell
  2019-03-22  4:22   ` Jimmie
  2019-03-22 13:56   ` Siddhesh Poyarekar
  2019-03-22 21:39 ` Eric Wong
  1 sibling, 2 replies; 21+ messages in thread
From: Carlos O'Donell @ 2019-03-22  3:38 UTC (permalink / raw)
  To: Jimmie, libc-help

On 3/21/19 11:20 PM, Jimmie wrote:
> Hi, For serveral days, I did some test  about the memory performance
> of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results
> indicate that glibc is more efficient then tcmalloc. generally,
> people think tcmalloc is efficient than glibc 2.3, but I use glibc
> 2.17, so I wonder if glibc 2.17 did some improvement on memory
> performance. looking forward to your reply, thank you.

There were no changes in 2.17 which improved malloc performance.

In 2.23 we fixed a malloc issue with the free list becoming cyclic,
causing decreased performance (due to increased contention).
That will improve performance.

In 2.26 we added a per-thread cache to malloc, so that should
also improve performance.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re:Re: about glibc performance
  2019-03-22  3:38 ` Carlos O'Donell
@ 2019-03-22  4:22   ` Jimmie
  2019-03-22 12:21     ` Florian Weimer
  2019-03-22 13:56   ` Siddhesh Poyarekar
  1 sibling, 1 reply; 21+ messages in thread
From: Jimmie @ 2019-03-22  4:22 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-help

thank you, but it makes me feel uncertain that why google's test report indicates that their tcmalloc is much better than glibc, does it based on the environment my program runs?   by the way, it is running in 'Red Hat 4.8.4, x86_64'.<br/>It's been a great benefit to me if you can give me some advise, O(∩_∩)O.
At 2019-03-22 11:38:38, "Carlos O'Donell" <codonell@redhat.com> wrote:
>On 3/21/19 11:20 PM, Jimmie wrote:
>> Hi, For serveral days, I did some test  about the memory performance
>> of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results
>> indicate that glibc is more efficient then tcmalloc. generally,
>> people think tcmalloc is efficient than glibc 2.3, but I use glibc
>> 2.17, so I wonder if glibc 2.17 did some improvement on memory
>> performance. looking forward to your reply, thank you.
>
>There were no changes in 2.17 which improved malloc performance.
>
>In 2.23 we fixed a malloc issue with the free list becoming cyclic,
>causing decreased performance (due to increased contention).
>That will improve performance.
>
>In 2.26 we added a per-thread cache to malloc, so that should
>also improve performance.
>
>-- 
>Cheers,
>Carlos.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: about glibc performance
  2019-03-22  4:22   ` Jimmie
@ 2019-03-22 12:21     ` Florian Weimer
  0 siblings, 0 replies; 21+ messages in thread
From: Florian Weimer @ 2019-03-22 12:21 UTC (permalink / raw)
  To: Jimmie; +Cc: Carlos O'Donell, libc-help

* Jimmie:

> thank you, but it makes me feel uncertain that why google's test
> report indicates that their tcmalloc is much better than glibc, does
> it based on the environment my program runs?

It depends on the environment and the application and its application
profile.  You may also have looked at outdated benchmark results.
Subsequent changes in tcmalloc heuristics could have made tcmalloc
perform worse in this particular benchmark.

In general, very few people report performance issues that prompt them
to switch from tcmalloc to glibc malloc because if tcmalloc does not
perform well for them, they will not switch away from glibc malloc in
the first place.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: about glibc performance
  2019-03-22  3:38 ` Carlos O'Donell
  2019-03-22  4:22   ` Jimmie
@ 2019-03-22 13:56   ` Siddhesh Poyarekar
  2019-03-22 14:53     ` Carlos O'Donell
                       ` (2 more replies)
  1 sibling, 3 replies; 21+ messages in thread
From: Siddhesh Poyarekar @ 2019-03-22 13:56 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: Jimmie, libc-help

On Fri, 22 Mar 2019 at 09:08, Carlos O'Donell <codonell@redhat.com> wrote:
>
> On 3/21/19 11:20 PM, Jimmie wrote:
> > Hi, For serveral days, I did some test  about the memory performance
> > of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results
> > indicate that glibc is more efficient then tcmalloc. generally,
> > people think tcmalloc is efficient than glibc 2.3, but I use glibc
> > 2.17, so I wonder if glibc 2.17 did some improvement on memory
> > performance. looking forward to your reply, thank you.
>
> There were no changes in 2.17 which improved malloc performance.

Actually, there were performance improvements to malloc between 2.3
and 2.17, primarily the per-thread allocator that greatly reduced
contention for multi-threaded applications.  I've argued in the past
that the per-thread allocator should bring performance of a number of
applications on par if not better than tcmalloc/jemalloc, but I never
did a formal run and so never wrote a formal rebuttal of the tcmalloc
claims.  If you've done formal tests, please do publish them!

Siddhesh
-- 
https://siddhesh.in

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: about glibc performance
  2019-03-22 13:56   ` Siddhesh Poyarekar
@ 2019-03-22 14:53     ` Carlos O'Donell
  2019-03-22 15:18       ` Patrick McGehearty
  2019-03-22 15:49     ` Paul Pluzhnikov via libc-help
  2019-03-25  7:41     ` Jimmie
  2 siblings, 1 reply; 21+ messages in thread
From: Carlos O'Donell @ 2019-03-22 14:53 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: Jimmie, libc-help

On 3/22/19 9:55 AM, Siddhesh Poyarekar wrote:
> On Fri, 22 Mar 2019 at 09:08, Carlos O'Donell <codonell@redhat.com> wrote:
>>
>> On 3/21/19 11:20 PM, Jimmie wrote:
>>> Hi, For serveral days, I did some test  about the memory performance
>>> of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results
>>> indicate that glibc is more efficient then tcmalloc. generally,
>>> people think tcmalloc is efficient than glibc 2.3, but I use glibc
>>> 2.17, so I wonder if glibc 2.17 did some improvement on memory
>>> performance. looking forward to your reply, thank you.
>>
>> There were no changes in 2.17 which improved malloc performance.
> 
> Actually, there were performance improvements to malloc between 2.3
> and 2.17, primarily the per-thread allocator that greatly reduced
> contention for multi-threaded applications.  I've argued in the past
> that the per-thread allocator should bring performance of a number of
> applications on par if not better than tcmalloc/jemalloc, but I never
> did a formal run and so never wrote a formal rebuttal of the tcmalloc
> claims.  If you've done formal tests, please do publish them!

Oh, right, certainly in 2.15 we switched on per-thread allocators!
I forgot all about that. And from 2.10 to 2.15 it could have been on
with --enalbe-experimental-allocator.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: about glibc performance
  2019-03-22 14:53     ` Carlos O'Donell
@ 2019-03-22 15:18       ` Patrick McGehearty
  2019-03-22 16:15         ` Siddhesh Poyarekar
  0 siblings, 1 reply; 21+ messages in thread
From: Patrick McGehearty @ 2019-03-22 15:18 UTC (permalink / raw)
  To: libc-help

There have been about 80 patches which modify files in the
malloc directory of the glibc src tree between 2.17 and 2.28.
2.17 was first available around 2013 and 2.28 in 2018.
Many fixes are minor, some are not. They include fixing race
conditions, fixing boundary condition computations, and
some are performance improvements.

For the performance improvements, in addition to offering
the thread-based allocation areas, glibc malloc has also
improved the code to decide when to return memory to the
system, reducing the thrash-effect of frequent small and
medium malloc/free combinations. That is a significant benefit
for C++ and Java programs that are written in a style
which does frequent "new/allocate and free" operations.
At least one of the SPEC cpu2017 benchmarks benefits
from this improvement.

Whenever comparing malloc replacements, be sure you
have recent versions of the allocators as there has
been considerable activity in improvement of several
of these allocators over the last 5-10 years.

On 3/22/2019 9:53 AM, Carlos O'Donell wrote:
> On 3/22/19 9:55 AM, Siddhesh Poyarekar wrote:
>> On Fri, 22 Mar 2019 at 09:08, Carlos O'Donell <codonell@redhat.com> 
>> wrote:
>>>
>>> On 3/21/19 11:20 PM, Jimmie wrote:
>>>> Hi, For serveral days, I did some testÂ  about the memory performance
>>>> of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results
>>>> indicate that glibc is more efficient then tcmalloc. generally,
>>>> people think tcmalloc is efficient than glibc 2.3, but I use glibc
>>>> 2.17, so I wonder if glibc 2.17 did some improvement on memory
>>>> performance. looking forward to your reply, thank you.
>>>
>>> There were no changes in 2.17 which improved malloc performance.
>>
>> Actually, there were performance improvements to malloc between 2.3
>> and 2.17, primarily the per-thread allocator that greatly reduced
>> contention for multi-threaded applications.Â  I've argued in the past
>> that the per-thread allocator should bring performance of a number of
>> applications on par if not better than tcmalloc/jemalloc, but I never
>> did a formal run and so never wrote a formal rebuttal of the tcmalloc
>> claims.Â  If you've done formal tests, please do publish them!
>
> Oh, right, certainly in 2.15 we switched on per-thread allocators!
> I forgot all about that. And from 2.10 to 2.15 it could have been on
> with --enalbe-experimental-allocator.
>

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: about glibc performance
  2019-03-22 13:56   ` Siddhesh Poyarekar
  2019-03-22 14:53     ` Carlos O'Donell
@ 2019-03-22 15:49     ` Paul Pluzhnikov via libc-help
  2019-03-22 16:13       ` Siddhesh Poyarekar
  2019-03-25  7:41     ` Jimmie
  2 siblings, 1 reply; 21+ messages in thread
From: Paul Pluzhnikov via libc-help @ 2019-03-22 15:49 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: Carlos O'Donell, Jimmie, libc-help

On Fri, Mar 22, 2019 at 6:56 AM Siddhesh Poyarekar
<siddhesh.poyarekar@gmail.com> wrote:

> I've argued in the past
> that the per-thread allocator should bring performance of a number of
> applications on par if not better than tcmalloc/jemalloc, but I never
> did a formal run and so never wrote a formal rebuttal of the tcmalloc
> claims.

The claims you didn't formally rebut are probably these:
http://goog-perftools.sourceforge.net/doc/tcmalloc.html

Note that that document is from 2005, and talks about benchmarking
against GLIBC 2.3 on Intel P4 processors in 32-bit mode.

It's not like TCMalloc has stood still for the last 15 years, but I
don't believe there have been any recent open-source releases of it.

-- 
Paul Pluzhnikov

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: about glibc performance
  2019-03-22 15:49     ` Paul Pluzhnikov via libc-help
@ 2019-03-22 16:13       ` Siddhesh Poyarekar
  0 siblings, 0 replies; 21+ messages in thread
From: Siddhesh Poyarekar @ 2019-03-22 16:13 UTC (permalink / raw)
  To: Paul Pluzhnikov; +Cc: Carlos O'Donell, Jimmie, libc-help

On Fri, 22 Mar 2019 at 21:18, Paul Pluzhnikov <ppluzhnikov@google.com> wrote:
> The claims you didn't formally rebut are probably these:
> http://goog-perftools.sourceforge.net/doc/tcmalloc.html

I know, IIRC at least 5 people had pointed me to that link back in the
day to tell me that I was wasting my time working on glibc and that I
should be working on the clouds ;)

> Note that that document is from 2005, and talks about benchmarking
> against GLIBC 2.3 on Intel P4 processors in 32-bit mode.
>
> It's not like TCMalloc has stood still for the last 15 years, but I
> don't believe there have been any recent open-source releases of it.

Agreed, I didn't mean to imply otherwise.  I mainly want to point out
that the assertion in that blog post is quite outdated and that the
comparison is a lot closer today.  Not only that, there will be
massive variations based on use cases and such evaluations may only
end up being of academic interest.

Siddhesh
-- 
http://siddhesh.in

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: about glibc performance
  2019-03-22 15:18       ` Patrick McGehearty
@ 2019-03-22 16:15         ` Siddhesh Poyarekar
  0 siblings, 0 replies; 21+ messages in thread
From: Siddhesh Poyarekar @ 2019-03-22 16:15 UTC (permalink / raw)
  To: Patrick McGehearty; +Cc: libc-help

On Fri, 22 Mar 2019 at 20:48, Patrick McGehearty
<patrick.mcgehearty@oracle.com> wrote:
> Whenever comparing malloc replacements, be sure you
> have recent versions of the allocators as there has
> been considerable activity in improvement of several
> of these allocators over the last 5-10 years.

... and also look at various tuning options[1] in the allocator that
may potentially have significant effects on performance.

Siddhesh

[1] https://www.gnu.org/software/libc/manual/html_node/Memory-Allocation-Tunables.html#Memory-Allocation-Tunables

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: about glibc performance
  2019-03-22  3:20 about glibc performance Jimmie
  2019-03-22  3:38 ` Carlos O'Donell
@ 2019-03-22 21:39 ` Eric Wong
  2019-03-22 22:20   ` Konstantin Kharlamov
  2019-03-23  2:05   ` Carlos O'Donell
  1 sibling, 2 replies; 21+ messages in thread
From: Eric Wong @ 2019-03-22 21:39 UTC (permalink / raw)
  To: Jimmie; +Cc: libc-help

Jimmie <zpjjimmie@163.com> wrote:
> Hi,
> For serveral days, I did some test  about the memory performance of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results indicate that glibc is more efficient then tcmalloc.
> generally, people think tcmalloc is efficient than glibc 2.3, but I use glibc 2.17, so I wonder if glibc 2.17 did some improvement on memory performance.
> looking forward to your reply, thank you.

You may also want to check out my proof-of-concept wfcqueue patch
which optimizes message passing:

https://public-inbox.org/libc-alpha/20180731084936.g4yw6wnvt677miti@dcvr/

Unfortunately, integrating wfcqueue/URCU into glibc will take
much effort :<

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: about glibc performance
  2019-03-22 21:39 ` Eric Wong
@ 2019-03-22 22:20   ` Konstantin Kharlamov
  2019-03-22 22:38     ` Eric Wong
  2019-03-23  2:05   ` Carlos O'Donell
  1 sibling, 1 reply; 21+ messages in thread
From: Konstantin Kharlamov @ 2019-03-22 22:20 UTC (permalink / raw)
  To: Eric Wong; +Cc: Jimmie, libc-help



В Сб, мар 23, 2019 at 12:38 ДП (AM), Eric Wong 
<normalperson@yhbt.net> написал:
> Jimmie <zpjjimmie@163.com> wrote:
>>  Hi,
>>  For serveral days, I did some test  about the memory performance of 
>> glibc(2.17) and tcmalloc(gperformance 2.7), and my test results 
>> indicate that glibc is more efficient then tcmalloc.
>>  generally, people think tcmalloc is efficient than glibc 2.3, but I 
>> use glibc 2.17, so I wonder if glibc 2.17 did some improvement on 
>> memory performance.
>>  looking forward to your reply, thank you.
> 
> You may also want to check out my proof-of-concept wfcqueue patch
> which optimizes message passing:
> 
> https://public-inbox.org/libc-alpha/20180731084936.g4yw6wnvt677miti@dcvr/
> 
> Unfortunately, integrating wfcqueue/URCU into glibc will take
> much effort :<

Also in future, I think, "restartable sequences" should improve malloc 
performance, right? I'm wondering, when will they appear…


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: about glibc performance
  2019-03-22 22:20   ` Konstantin Kharlamov
@ 2019-03-22 22:38     ` Eric Wong
  2019-03-23  1:49       ` Carlos O'Donell
  0 siblings, 1 reply; 21+ messages in thread
From: Eric Wong @ 2019-03-22 22:38 UTC (permalink / raw)
  To: Konstantin Kharlamov; +Cc: Jimmie, libc-help

Konstantin Kharlamov <hi-angel@yandex.ru> wrote:
> Also in future, I think, "restartable sequences" should improve malloc
> performance, right? I'm wondering, when will they appearâ€¦

Yes, they're complementary ideas.  Looks like
rseq is being discussed in libc-alpha, lately.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: about glibc performance
  2019-03-22 22:38     ` Eric Wong
@ 2019-03-23  1:49       ` Carlos O'Donell
  0 siblings, 0 replies; 21+ messages in thread
From: Carlos O'Donell @ 2019-03-23  1:49 UTC (permalink / raw)
  To: Eric Wong, Konstantin Kharlamov; +Cc: Jimmie, libc-help

On 3/22/19 6:38 PM, Eric Wong wrote:
> Konstantin Kharlamov <hi-angel@yandex.ru> wrote:
>> Also in future, I think, "restartable sequences" should improve malloc
>> performance, right? I'm wondering, when will they appearâ€¦
> 
> Yes, they're complementary ideas.  Looks like
> rseq is being discussed in libc-alpha, lately.

I just did a review of the rseq patches, and I think the
biggest sticking point is the registration ref-count
interface, but Mathieu has proposed something even simpler
that should work, and now we'll try to get buy in from
the various senior reviewers.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: about glibc performance
  2019-03-22 21:39 ` Eric Wong
  2019-03-22 22:20   ` Konstantin Kharlamov
@ 2019-03-23  2:05   ` Carlos O'Donell
  1 sibling, 0 replies; 21+ messages in thread
From: Carlos O'Donell @ 2019-03-23  2:05 UTC (permalink / raw)
  To: Eric Wong, Jimmie; +Cc: libc-help

On 3/22/19 5:38 PM, Eric Wong wrote:
> Jimmie <zpjjimmie@163.com> wrote:
>> Hi, For serveral days, I did some test  about the memory
>> performance of glibc(2.17) and tcmalloc(gperformance 2.7), and my
>> test results indicate that glibc is more efficient then tcmalloc. 
>> generally, people think tcmalloc is efficient than glibc 2.3, but I
>> use glibc 2.17, so I wonder if glibc 2.17 did some improvement on
>> memory performance. looking forward to your reply, thank you.
> 
> You may also want to check out my proof-of-concept wfcqueue patch 
> which optimizes message passing:
> 
> https://public-inbox.org/libc-alpha/20180731084936.g4yw6wnvt677miti@dcvr/
>
>  Unfortunately, integrating wfcqueue/URCU into glibc will take much
> effort :<

FAOD I really like the direction these patches take glibc's malloc.
I just don't have time to push your idea to completion. I know it's
a lot of effort to push novel ideas forward, but it touches
pieces of code which are used by a lot of programs. The best tooling
we have today is malloc trace/simulation to capture and compare
workloads before and after:
https://pagure.io/glibc-malloc-trace-utils

So the hard part is not the changes to the code, but it's in doing
the before/after performance comparison for a variety of workloads
and capturing those workloads with the tracer so others can look
at and run them. We don't even have a good place to store large
workloads (we're talking to overseers to see if we can enable some
git-annex support on sourceware, git lfs is still too new).

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re:Re: about glibc performance
  2019-03-22 13:56   ` Siddhesh Poyarekar
  2019-03-22 14:53     ` Carlos O'Donell
  2019-03-22 15:49     ` Paul Pluzhnikov via libc-help
@ 2019-03-25  7:41     ` Jimmie
  2019-03-25  9:37       ` Konstantin Kharlamov
  2019-03-25  9:47       ` Siddhesh Poyarekar
  2 siblings, 2 replies; 21+ messages in thread
From: Jimmie @ 2019-03-25  7:41 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: Carlos O'Donell, libc-help

It seems like that I can't send attachment to libc-help. so I simply describe my test results.<br/>malloc and free 10000 times in per-thread, the datas is below(left column represent memsize per malloc. and the other column represent the cost time it uses, )<br/>                  glibc	tcmalloc	tbbmalloc<br/>1	            213us	          821us	168us<br/>10	            215	          820	175<br/>50	            208	          832	186<br/>100	            204	         808	        194<br/>500	            250	         852	        183<br/>1000	    428	         859	        180<br/>5000	    414	         892	        190<br/>8128	    389	         880	        194<br/>8129	    388	         891	        525<br/>10000	    392	         838	        554<br/>100000	    332	         846	        562<br/>262144	    321	         852	        546<br/>262145	     312	         2045	600<br/>1000000	     312	         2126	555<br/>10000000     331	        4645	1228<br/><br/><br/>malloc and free in 4 threads, 5000 times per-thread<br/> 	         glibc	tcmalloc	tbbmalloc<br/>1	            284	        1629	186<br/>10	            297	        1093	143<br/>48	            285	        1252	151<br/>49	            282	        552	        150<br/>100	            283	        556	        157<br/>1000	    322	        510 	        168<br/>5000	    313	        529	        162<br/>8128	    312	        528	        173<br/>8129	    332	        597	        1350<br/>10000	    324	        589	        1425<br/>100000	    316	        535	        1428<br/>262144	    319	        534	        1524<br/>262145	    328	        32596	1545<br/>1000000	    321	        27106	1330<br/>10000000    323	        34590	14141<br/><br/>and I can provide my test code if necessary.<br/><br/>for more convenient and effective, maybe you can use the benchmark from here, https://github.com/gperftools/gperftools/tree/master/benchmark.
At 2019-03-22 21:55:47, "Siddhesh Poyarekar" <siddhesh.poyarekar@gmail.com> wrote:
>On Fri, 22 Mar 2019 at 09:08, Carlos O'Donell <codonell@redhat.com> wrote:
>>
>> On 3/21/19 11:20 PM, Jimmie wrote:
>> > Hi, For serveral days, I did some test  about the memory performance
>> > of glibc(2.17) and tcmalloc(gperformance 2.7), and my test results
>> > indicate that glibc is more efficient then tcmalloc. generally,
>> > people think tcmalloc is efficient than glibc 2.3, but I use glibc
>> > 2.17, so I wonder if glibc 2.17 did some improvement on memory
>> > performance. looking forward to your reply, thank you.
>>
>> There were no changes in 2.17 which improved malloc performance.
>
>Actually, there were performance improvements to malloc between 2.3
>and 2.17, primarily the per-thread allocator that greatly reduced
>contention for multi-threaded applications.  I've argued in the past
>that the per-thread allocator should bring performance of a number of
>applications on par if not better than tcmalloc/jemalloc, but I never
>did a formal run and so never wrote a formal rebuttal of the tcmalloc
>claims.  If you've done formal tests, please do publish them!
>
>Siddhesh
>-- 
>https://siddhesh.in

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re:Re: about glibc performance
  2019-03-25  7:41     ` Jimmie
@ 2019-03-25  9:37       ` Konstantin Kharlamov
  2019-03-25  9:47       ` Siddhesh Poyarekar
  1 sibling, 0 replies; 21+ messages in thread
From: Konstantin Kharlamov @ 2019-03-25  9:37 UTC (permalink / raw)
  To: Jimmie; +Cc: Siddhesh Poyarekar, Carlos O'Donell, libc-help



On Пн, Mar 25, 2019 at 10:37:11, Jimmie <zpjjimmie@163.com> wrote:
> It seems like that I can't send attachment to libc-help. so I simply 
> describe my test results.<br/>malloc and free 10000 times in 
> per-thread, the datas is below(left column represent memsize per 
> malloc. and the other column represent the cost time it uses, )<br/>  
>                 glibc	tcmalloc	tbbmalloc<br/>1	            213us	     
>      821us	168us<br/>10	            215	          820	175<br/>50	     
>        208	          832	186<br/>100	            204	         808	    
>     194<br/>500	            250	         852	        183<br/>1000	    
> 428	         859	        180<br/>5000	    414	         892	        
> 190<br/>8128	    389	         880	        194<br/>8129	    388	       
>   891	        525<br/>10000	    392	         838	        
> 554<br/>100000	    332	         846	        562<br/>262144	    321	   
>       852	        546<br/>262145	     312	         
> 2045	600<br/>1000000	     312	         2126	555<br/>10000000     331	 
>        4645	1228<br/><br/><br/>malloc and free in 4 threads, 5000 
> times per-thread<br/> 	         glibc	tcmalloc	tbbmalloc<br/>1	       
>      284	        1629	186<br/>10	            297	        
> 1093	143<br/>48	            285	        1252	151<br/>49	            
> 282	        552	        150<br/>100	            283	        556	      
>   157<br/>1000	    322	        510 	        168<br/>5000	    313	     
>    529	        162<br/>8128	    312	        528	        173<br/>8129	 
>    332	        597	        1350<br/>10000	    324	        589	        
> 1425<br/>100000	    316	        535	        1428<br/>262144	    319	  
>       534	        1524<br/>262145	    328	        
> 32596	1545<br/>1000000	    321	        27106	1330<br/>10000000    
> 323	        34590	14141<br/><br/>and I can provide my test code if 
> necessary.<br/><br/>for more convenient and effective, maybe you can 
> use the benchmark from here, 
> https://github.com/gperftools/gperftools/tree/master/benchmark.

Oops, looks like your email client doesn't handle plain text well. But 
nothing that a few Emacs regexps can't fix :)

Here're fixed benchmark results.

size     glibc          tcmalloc    tbbmalloc
1        213us          821us        168us
10       215            820          175
50       208            832          186
100      204            808          194
500      250            852          183
1000     428            859          180
5000     414            892          190
8128     389            880          194
8129     388            891          525
10000    392            838          554
100000   332            846          562
262144   321            852          546
262145   312            2045         600
1000000  312            2126         555
10000000 331            4645         1228


malloc and free in 4 threads, 5000 times per-thread
size     glibc         tcmalloc  tbbmalloc
1        284            1629    186
10       297            1093    143
48       285            1252    151
49       282            552     150
100      283            556     157
1000     322            510     168
5000     313            529     162
8128     312            528     173
8129     332            597     1350
10000    324            589     1425
100000   316            535     1428
262144   319            534     1524
262145   328            32596   1545
1000000  321            27106   1330
10000000 323            34590   14141

Well, looks like tcmalloc lags far behind the glibc, that's cool (well 
for us at least :).


^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Re: about glibc performance
  2019-03-25  7:41     ` Jimmie
  2019-03-25  9:37       ` Konstantin Kharlamov
@ 2019-03-25  9:47       ` Siddhesh Poyarekar
       [not found]         ` <1e77c218.6a0c.169b7c2ab4c.Coremail.zpjjimmie@163.com>
  2019-03-27 21:22         ` Carlos O'Donell
  1 sibling, 2 replies; 21+ messages in thread
From: Siddhesh Poyarekar @ 2019-03-25  9:47 UTC (permalink / raw)
  To: Jimmie; +Cc: Carlos O'Donell, libc-help

On Mon, 25 Mar 2019 at 13:11, Jimmie <zpjjimmie@163.com> wrote:
>
> It seems like that I can't send attachment to libc-help. so I simply describe my test results.<br/>malloc and free 10000 times in per-thread, the datas is below(left column represent memsize per malloc. and the other column represent the cost time it uses, )

Thank you for sharing your results.  While the results are very
tempting to share (because of my obvious bias as a glibc developer),
simply allocating and freeing repeatedly in per-thread may not be a
sufficient enough test.  This does show that glibc does significantly
better than tcmalloc for same size reallocations, but not much else.
That is unless you're baking in a way to mix up the sizes and
allocations that mimic some known real world workload(s).

If you're interested in pursuing this further, I would recommend
profiling a program like firefox or libreoffice to find
malloc/calloc/realloc/free calls and then mimicing that workload
somehow.  That would be a much nicer benchmark to do this kind of
comparison.

Thanks,
Siddhesh

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: Re: Re: about glibc performance
       [not found]         ` <1e77c218.6a0c.169b7c2ab4c.Coremail.zpjjimmie@163.com>
@ 2019-03-26  3:02           ` Siddhesh Poyarekar
  2019-03-26  3:44             ` Jimmie
  0 siblings, 1 reply; 21+ messages in thread
From: Siddhesh Poyarekar @ 2019-03-26  3:02 UTC (permalink / raw)
  To: Jimmie; +Cc: Carlos O'Donell, libc-help

That's really cool, thanks for doing this!  Is this the latest glibc
or glibc-2.17?  Also, is tcmalloc the latest one too?

As for glibc improvements, there are spikes in
bench_fastpath_stack_simple(8192) and
bench_fastpath_rnd_dependent(8192) that may be worth looking into.

Siddhesh

On Tue, 26 Mar 2019 at 07:50, Jimmie <zpjjimmie@163.com> wrote:
>
> I agree that it's not a sufficient enough test, so I also use the benchmark from https://github.com/gperftools/gperftools/tree/master/benchmark, which is provided by google.
> I also had the test result and attach to the attachment. You can download it and maybe you should open it with nodepad++ or some else to typeset nicely.
>
> Thanks.
> Jimmie
>
>
> At 2019-03-25 17:46:51, "Siddhesh Poyarekar" <siddhesh.poyarekar@gmail.com> wrote:
> >On Mon, 25 Mar 2019 at 13:11, Jimmie <zpjjimmie@163.com> wrote:
> >>
> >> It seems like that I can't send attachment to libc-help. so I simply describe my test results.<br/>malloc and free 10000 times in per-thread, the datas is below(left column represent memsize per malloc. and the other column represent the cost time it uses, )
> >
> >Thank you for sharing your results.  While the results are very
> >tempting to share (because of my obvious bias as a glibc developer),
> >simply allocating and freeing repeatedly in per-thread may not be a
> >sufficient enough test.  This does show that glibc does significantly
> >better than tcmalloc for same size reallocations, but not much else.
> >That is unless you're baking in a way to mix up the sizes and
> >allocations that mimic some known real world workload(s).
> >
> >If you're interested in pursuing this further, I would recommend
> >profiling a program like firefox or libreoffice to find
> >malloc/calloc/realloc/free calls and then mimicing that workload
> >somehow.  That would be a much nicer benchmark to do this kind of
> >comparison.
> >
> >Thanks,
> >Siddhesh
>
>
>
>
>
>
> --
> Jimmie



-- 
http://siddhesh.in

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re:Re: Re: Re: about glibc performance
  2019-03-26  3:02           ` Siddhesh Poyarekar
@ 2019-03-26  3:44             ` Jimmie
  0 siblings, 0 replies; 21+ messages in thread
From: Jimmie @ 2019-03-26  3:44 UTC (permalink / raw)
  To: Siddhesh Poyarekar; +Cc: Carlos O'Donell, libc-help

glibc 2.17, and tcmalloc is the latest(gperftools 2.7 which is updated on 30 Apr 2018) .


And here is my environment:
redhat 4.8.5-16
x86_64


And tcmalloc is only better in bench_fastpath_stack_simple(8192).


Jimmie




At 2019-03-26 11:02:27, "Siddhesh Poyarekar" <siddhesh.poyarekar@gmail.com> wrote:
>That's really cool, thanks for doing this!  Is this the latest glibc
>or glibc-2.17?  Also, is tcmalloc the latest one too?
>
>As for glibc improvements, there are spikes in
>bench_fastpath_stack_simple(8192) and
>bench_fastpath_rnd_dependent(8192) that may be worth looking into.
>
>Siddhesh
>
>On Tue, 26 Mar 2019 at 07:50, Jimmie <zpjjimmie@163.com> wrote:
>>
>> I agree that it's not a sufficient enough test, so I also use the benchmark from https://github.com/gperftools/gperftools/tree/master/benchmark, which is provided by google.
>> I also had the test result and attach to the attachment. You can download it and maybe you should open it with nodepad++ or some else to typeset nicely.
>>
>> Thanks.
>> Jimmie
>>
>>
>> At 2019-03-25 17:46:51, "Siddhesh Poyarekar" <siddhesh.poyarekar@gmail.com> wrote:
>> >On Mon, 25 Mar 2019 at 13:11, Jimmie <zpjjimmie@163.com> wrote:
>> >>
>> >> It seems like that I can't send attachment to libc-help. so I simply describe my test results.<br/>malloc and free 10000 times in per-thread, the datas is below(left column represent memsize per malloc. and the other column represent the cost time it uses, )
>> >
>> >Thank you for sharing your results.  While the results are very
>> >tempting to share (because of my obvious bias as a glibc developer),
>> >simply allocating and freeing repeatedly in per-thread may not be a
>> >sufficient enough test.  This does show that glibc does significantly
>> >better than tcmalloc for same size reallocations, but not much else.
>> >That is unless you're baking in a way to mix up the sizes and
>> >allocations that mimic some known real world workload(s).
>> >
>> >If you're interested in pursuing this further, I would recommend
>> >profiling a program like firefox or libreoffice to find
>> >malloc/calloc/realloc/free calls and then mimicing that workload
>> >somehow.  That would be a much nicer benchmark to do this kind of
>> >comparison.
>> >
>> >Thanks,
>> >Siddhesh
>>
>>
>>
>>
>>
>>
>> --
>> Jimmie
>
>
>
>-- 
>http://siddhesh.in

^ permalink raw reply	[flat|nested] 21+ messages in thread

* Re: about glibc performance
  2019-03-25  9:47       ` Siddhesh Poyarekar
       [not found]         ` <1e77c218.6a0c.169b7c2ab4c.Coremail.zpjjimmie@163.com>
@ 2019-03-27 21:22         ` Carlos O'Donell
  1 sibling, 0 replies; 21+ messages in thread
From: Carlos O'Donell @ 2019-03-27 21:22 UTC (permalink / raw)
  To: Siddhesh Poyarekar, Jimmie; +Cc: libc-help

On 3/25/19 5:46 AM, Siddhesh Poyarekar wrote:
> On Mon, 25 Mar 2019 at 13:11, Jimmie <zpjjimmie@163.com> wrote:
>>
>> It seems like that I can't send attachment to libc-help. so I simply describe my test results.<br/>malloc and free 10000 times in per-thread, the datas is below(left column represent memsize per malloc. and the other column represent the cost time it uses, )
> 
> Thank you for sharing your results.  While the results are very
> tempting to share (because of my obvious bias as a glibc developer),
> simply allocating and freeing repeatedly in per-thread may not be a
> sufficient enough test.  This does show that glibc does significantly
> better than tcmalloc for same size reallocations, but not much else.
> That is unless you're baking in a way to mix up the sizes and
> allocations that mimic some known real world workload(s).
> 
> If you're interested in pursuing this further, I would recommend
> profiling a program like firefox or libreoffice to find
> malloc/calloc/realloc/free calls and then mimicing that workload
> somehow.  That would be a much nicer benchmark to do this kind of
> comparison.

I agree.

We need whole-system benchmarking for malloc workloads, and we need
to take into account things like:

* page touch heuristics
* cache hit/miss rates and their distributions
* inter-thread dependencies or lack of them

We have only basic tooling today for this:
https://pagure.io/glibc-malloc-trace-utils

DJ and I just split out the integrated malloc trace into a LD_PRELOAD-able
tracer now, using the same mapped-window thread-safe algorithm for
tracing. I expect we'll be able to now finish the hook deprecation and
replace the trace with this thread-safe LD_PRELOAD-able version.

-- 
Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 21+ messages in thread

end of thread, other threads:[~2019-03-27 21:22 UTC | newest]

Thread overview: 21+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2019-03-22  3:20 about glibc performance Jimmie
2019-03-22  3:38 ` Carlos O'Donell
2019-03-22  4:22   ` Jimmie
2019-03-22 12:21     ` Florian Weimer
2019-03-22 13:56   ` Siddhesh Poyarekar
2019-03-22 14:53     ` Carlos O'Donell
2019-03-22 15:18       ` Patrick McGehearty
2019-03-22 16:15         ` Siddhesh Poyarekar
2019-03-22 15:49     ` Paul Pluzhnikov via libc-help
2019-03-22 16:13       ` Siddhesh Poyarekar
2019-03-25  7:41     ` Jimmie
2019-03-25  9:37       ` Konstantin Kharlamov
2019-03-25  9:47       ` Siddhesh Poyarekar
     [not found]         ` <1e77c218.6a0c.169b7c2ab4c.Coremail.zpjjimmie@163.com>
2019-03-26  3:02           ` Siddhesh Poyarekar
2019-03-26  3:44             ` Jimmie
2019-03-27 21:22         ` Carlos O'Donell
2019-03-22 21:39 ` Eric Wong
2019-03-22 22:20   ` Konstantin Kharlamov
2019-03-22 22:38     ` Eric Wong
2019-03-23  1:49       ` Carlos O'Donell
2019-03-23  2:05   ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).