public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
From: "nicolas at freedelity dot be" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug malloc/30579] New: trim_threshold in realloc lead to high memory usage
Date: Thu, 22 Jun 2023 13:54:08 +0000	[thread overview]
Message-ID: <bug-30579-131@http.sourceware.org/bugzilla/> (raw)

https://sourceware.org/bugzilla/show_bug.cgi?id=30579

            Bug ID: 30579
           Summary: trim_threshold in realloc lead to high memory usage
           Product: glibc
           Version: 2.37
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: malloc
          Assignee: unassigned at sourceware dot org
          Reporter: nicolas at freedelity dot be
  Target Milestone: ---

The recent usage of trim_threshold in realloc (commit
f4f2ca1509288f6f780af50659693a89949e7e46:
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=f4f2ca1509288f6f780af50659693a89949e7e46)
is preventing to reclaim unused memory in some use case scenario.

The scenario that is affecting us now is that we need to normalize a lot of
strings (around ~20M) and their final size after normalization is unknown (just
that we have a upperbound of 1K bytes). So the memory is preallocated to the
maximum size and then sized down to the final size, so that we avoid many
reallocation during string constructions.

So during normalization, the memory for a given string is preallocated to 1024
and then reallocated back to its final size. The default value for
trim_threshold is 128K so this memory is never reclaimed.The process ends up
eating much more memory than necessary.

The following test program reproduce the issue:

```
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main() {
  int i;
  int count = 1000000;
  char ** strings = malloc(sizeof(char*) * count);

  for(i=0; i<count; i++){
    strings[i] = malloc(sizeof(char) * 1000);
    strcpy(strings[i], "hello");
    strings[i] = realloc(strings[i], sizeof(char) * 6);
  }


  while(1) {
    sleep(100);
  }

  return 0;
}
```

It constructs 1M strings set to "hello" by preallocating a 1000-bytes array for
each string to emulate the fact that we do not know its size in advance. The
correct size is set back with a call to realloc.

Until GLIBC 2.36, the memory was correctly reclaimed when calling realloc and
we end up consuming around 40 MB.
Starting from GLIBC 2.37 and the mentionned commit, the memory goes up until
1GB.

I'm not an expert but reusing trim_threshold as an heuristic to decide to
reclaim memory or not does not seem a good fit for heap memory. 
By reading the comments, I would be tempted to set glibc.malloc.trim_threshold
(via GLIBC_TUNABLES) to a value high enough for performance reason but now, we
are somewhat forced to set it to a very low value to avoid consuming too much
memory.

In my opinion, the heuristics should probably be based on an independent value.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

             reply	other threads:[~2023-06-22 13:54 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-06-22 13:54 nicolas at freedelity dot be [this message]
2023-06-22 14:20 ` [Bug malloc/30579] " siddhesh at sourceware dot org
2023-06-22 15:10 ` nicolas at freedelity dot be
2023-06-28  7:56 ` nicolas at freedelity dot be
2023-06-28 16:31 ` siddhesh at sourceware dot org
2023-07-06 15:38 ` cvs-commit at gcc dot gnu.org
2023-07-06 15:40 ` cvs-commit at gcc dot gnu.org
2023-07-06 15:42 ` siddhesh at sourceware dot org
2023-07-06 16:09 ` nicolas at freedelity dot be

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-30579-131@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=glibc-bugs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).