Re: malloc: performance improvements and bugfixes

public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed

From: Torvald Riegel <triegel@redhat.com>
To: "Jörn Engel" <joern@purestorage.com>
Cc: "GNU C. Library" <libc-alpha@sourceware.org>,
	Siddhesh Poyarekar <siddhesh.poyarekar@gmail.com>,
	Joern Engel <joern@purestorage.org>
Subject: Re: malloc: performance improvements and bugfixes
Date: Wed, 27 Jan 2016 13:20:00 -0000	[thread overview]
Message-ID: <1453900833.4592.134.camel@localhost.localdomain> (raw)
In-Reply-To: <20160126175940.GK5745@Sligo.logfs.org>

On Tue, 2016-01-26 at 09:59 -0800, JÃ¶rn Engel wrote:
> On Tue, Jan 26, 2016 at 06:35:59PM +0100, Torvald Riegel wrote:
> > On Tue, 2016-01-26 at 09:26 -0800, JÃ¶rn Engel wrote:
> > > On Tue, Jan 26, 2016 at 06:20:25PM +0100, Torvald Riegel wrote:
> > > > 
> > > > How do the allocation patterns look like?  There can be big variations
> > > > in allocation frequency and size, lifetime of allocated regions,
> > > > relation between allocations and locality, etc.  Some programs allocate
> > > > most up-front, others have lots of alloc/dealloc during the lifetime of
> > > > the program.
> > > 
> > > Lots of alloc/dealloc during the lifetime.  To give you a rough scale,
> > > malloc consumed around 1.7% cputime in the stable state.  Now it is down
> > > to about 0.7%.
> > 
> > Eventually, I think we'd like to get more detail on this, so that we
> > start tracking performance regressions too and that the model of
> > workloads we have is less hand-wavy than "big application".  Given that
> > malloc will remain a general-purpose allocator (at least in the default
> > config / tuning), we'll have to choose trade-off so that they represent
> > workloads, for which we'll have to classify workloads in some way.
> 
> The workload itself is closed source, so the only ones ever testing with
> that workload is likely us.  Hence my handwaving.  Having some other
> application available to the general public for testing would be nice,
> though.
> 
> I suspect there isn't even a shortage of applications to choose from.
> Firefox uses jemalloc.  If you can automate some runs in firefox and
> compare jemalloc to libc malloc, you will likely find the same problems
> we encountered.  And from what I heard there is no shortage of open
> source applications that switched over to jemalloc or tcmalloc and could
> be used as well.

Please note that I was specifically talking about the *model* of
workloads.  It is true that testing with specific programs running a
certain workload (ie, the program's workload, not malloc's) can yield a
useful data point.  But it's not sufficient if one wants to track
performance of a general-purpose allocator, unless one runs *lots* of
programs with *lots* of different program-specific workloads.

IMO we need a model of the workload that provides us with more insight
into what's actually going on in the allocator and the program -- at the
very least so that we can actually discuss which trade-offs we want to
make.  For example, "proprietary big application X was 10% faster" is a
data point but doesn't tell us anything actionable, really (except that
there is room for improvement).  First, why was it faster?  Was it due
to overheads of actual allocation going down, or was it because of how
the allocator places allocated data in the memory hierarchy, or
something else?  Second, what kinds of programs / allocation patterns
are affected by this?

Your description of the problems you saw already hinted at some of these
aspects but, for example, contained little information about the
allocation patterns and memory access patterns of the program during
runtime.  For example, considering NUMA, allocation patterns influence
where allocations end up, based on how malloc works; this and the memory
access patterns of the program then affect performance.  You can't do
much within a page after allocation, so kernel-level auto-NUMA or such
has limits.  This becomes a bigger problem with larger pages.  Thus,
there won't be a malloc strategy that's always optimal for all kinds of
allocation and access patterns, so we need to understand what's going on
beyond "program X was faster".  ISTM that you should be able to discuss
the allocation and access patterns of your application without revealing
the internals of your application.

next prev parent reply	other threads:[~2016-01-27 13:20 UTC|newest]

Thread overview: 48+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2016-01-26  0:25 Joern Engel
2016-01-26  0:25 ` [PATCH] malloc: kill mprotect Joern Engel
2016-01-26 12:22 ` malloc: performance improvements and bugfixes Torvald Riegel
2016-01-26 17:15   ` Jörn Engel
2016-01-26 17:20     ` Torvald Riegel
2016-01-26 17:27       ` Jörn Engel
2016-01-26 17:36         ` Torvald Riegel
2016-01-26 18:00           ` Jörn Engel
2016-01-27 13:20             ` Torvald Riegel [this message]
2016-01-27 17:39               ` Jörn Engel
2016-01-27 17:49                 ` Mike Frysinger
2016-01-27 21:37                 ` Torvald Riegel
2016-01-27 23:50                   ` Jörn Engel
2016-01-28 14:05                     ` Torvald Riegel
2016-01-28 14:25                 ` Szabolcs Nagy
2016-01-26  0:26 Joern Engel
2016-01-26  0:50 ` Paul Eggert
2016-01-26  1:01   ` Jörn Engel
2016-01-26  1:52     ` Siddhesh Poyarekar
2016-01-26  2:45       ` Jörn Engel
2016-01-26  3:22         ` Jörn Engel
2016-01-26  6:22           ` Mike Frysinger
2016-01-26  7:54             ` Jörn Engel
2016-01-26  9:53               ` Florian Weimer
2016-01-26 17:05                 ` Jörn Engel
2016-01-26 17:31                   ` Paul Eggert
2016-01-26 17:48                     ` Adhemerval Zanella
2016-01-26 17:49                   ` Joseph Myers
2016-01-26 17:57                   ` Mike Frysinger
2016-01-27 20:46                     ` Manuel López-Ibáñez
2016-01-26 12:37           ` Torvald Riegel
2016-01-26 13:23           ` Florian Weimer
2016-01-26  7:40         ` Paul Eggert
2016-01-26  9:54         ` Florian Weimer
2016-01-26  1:52     ` Joseph Myers
2016-01-26 20:50   ` Steven Munroe
2016-01-26 21:40     ` Florian Weimer
2016-01-26 21:48       ` Steven Munroe
2016-01-26 21:51         ` Florian Weimer
2016-01-26 21:51       ` Paul Eggert
2016-01-26 21:57         ` Florian Weimer
2016-01-26 22:18           ` Paul Eggert
2016-01-26 22:24             ` Florian Weimer
2016-01-27  1:31               ` Paul Eggert
2016-01-26 22:00       ` Jörn Engel
2016-01-26 22:02         ` Florian Weimer
2016-01-27 21:45   ` Carlos O'Donell
2016-01-28 13:51 ` Carlos O'Donell

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=1453900833.4592.134.camel@localhost.localdomain \
    --to=triegel@redhat.com \
    --cc=joern@purestorage.com \
    --cc=joern@purestorage.org \
    --cc=libc-alpha@sourceware.org \
    --cc=siddhesh.poyarekar@gmail.com \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).