public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug malloc/30579] New: trim_threshold in realloc lead to high memory usage
@ 2023-06-22 13:54 nicolas at freedelity dot be
  2023-06-22 14:20 ` [Bug malloc/30579] " siddhesh at sourceware dot org
                   ` (7 more replies)
  0 siblings, 8 replies; 9+ messages in thread
From: nicolas at freedelity dot be @ 2023-06-22 13:54 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30579

            Bug ID: 30579
           Summary: trim_threshold in realloc lead to high memory usage
           Product: glibc
           Version: 2.37
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: malloc
          Assignee: unassigned at sourceware dot org
          Reporter: nicolas at freedelity dot be
  Target Milestone: ---

The recent usage of trim_threshold in realloc (commit
f4f2ca1509288f6f780af50659693a89949e7e46:
https://sourceware.org/git/?p=glibc.git;a=commitdiff;h=f4f2ca1509288f6f780af50659693a89949e7e46)
is preventing to reclaim unused memory in some use case scenario.

The scenario that is affecting us now is that we need to normalize a lot of
strings (around ~20M) and their final size after normalization is unknown (just
that we have a upperbound of 1K bytes). So the memory is preallocated to the
maximum size and then sized down to the final size, so that we avoid many
reallocation during string constructions.

So during normalization, the memory for a given string is preallocated to 1024
and then reallocated back to its final size. The default value for
trim_threshold is 128K so this memory is never reclaimed.The process ends up
eating much more memory than necessary.

The following test program reproduce the issue:

```
#include <stdlib.h>
#include <string.h>
#include <unistd.h>

int main() {
  int i;
  int count = 1000000;
  char ** strings = malloc(sizeof(char*) * count);

  for(i=0; i<count; i++){
    strings[i] = malloc(sizeof(char) * 1000);
    strcpy(strings[i], "hello");
    strings[i] = realloc(strings[i], sizeof(char) * 6);
  }


  while(1) {
    sleep(100);
  }

  return 0;
}
```

It constructs 1M strings set to "hello" by preallocating a 1000-bytes array for
each string to emulate the fact that we do not know its size in advance. The
correct size is set back with a call to realloc.

Until GLIBC 2.36, the memory was correctly reclaimed when calling realloc and
we end up consuming around 40 MB.
Starting from GLIBC 2.37 and the mentionned commit, the memory goes up until
1GB.

I'm not an expert but reusing trim_threshold as an heuristic to decide to
reclaim memory or not does not seem a good fit for heap memory. 
By reading the comments, I would be tempted to set glibc.malloc.trim_threshold
(via GLIBC_TUNABLES) to a value high enough for performance reason but now, we
are somewhat forced to set it to a very low value to avoid consuming too much
memory.

In my opinion, the heuristics should probably be based on an independent value.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug malloc/30579] trim_threshold in realloc lead to high memory usage
  2023-06-22 13:54 [Bug malloc/30579] New: trim_threshold in realloc lead to high memory usage nicolas at freedelity dot be
@ 2023-06-22 14:20 ` siddhesh at sourceware dot org
  2023-06-22 15:10 ` nicolas at freedelity dot be
                   ` (6 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: siddhesh at sourceware dot org @ 2023-06-22 14:20 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30579

Siddhesh Poyarekar <siddhesh at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |siddhesh at sourceware dot org

--- Comment #1 from Siddhesh Poyarekar <siddhesh at sourceware dot org> ---
(In reply to Nicolas Dusart from comment #0)
> I'm not an expert but reusing trim_threshold as an heuristic to decide to
> reclaim memory or not does not seem a good fit for heap memory. 

That's the entire point of trim_threshold; to control fragmentation tolerance
for heap memory.

> By reading the comments, I would be tempted to set
> glibc.malloc.trim_threshold (via GLIBC_TUNABLES) to a value high enough for
> performance reason but now, we are somewhat forced to set it to a very low
> value to avoid consuming too much memory.

There are multiple ways to reach this kind of fragmentation I'm afraid and
setting trim threshold (and mmap_threshold for that matter) is the way to
customize the allocator to your specific needs.  There are other ways to
control trim threshold than tunables FWIW, e.g. by using mallopt() to move the
threshold around the way you need at any time in the program.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug malloc/30579] trim_threshold in realloc lead to high memory usage
  2023-06-22 13:54 [Bug malloc/30579] New: trim_threshold in realloc lead to high memory usage nicolas at freedelity dot be
  2023-06-22 14:20 ` [Bug malloc/30579] " siddhesh at sourceware dot org
@ 2023-06-22 15:10 ` nicolas at freedelity dot be
  2023-06-28  7:56 ` nicolas at freedelity dot be
                   ` (5 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: nicolas at freedelity dot be @ 2023-06-22 15:10 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30579

--- Comment #2 from Nicolas Dusart <nicolas at freedelity dot be> ---
I apologize for any previous misunderstanding regarding the use of
trim_threshold in relation to heap fragmentation. I incorrectly assumed that it
was primarily relevant to the mmap use case, based on the message of the
commit.

I still have concerns about the recent change to the realloc function and I
would argue that it doesn't significantly mitigate fragmentation.

There are two primary reasons for my concern:

- Memory "freed" by realloc is not immediately accessible, even to the process
itself.
- The memory that is reallocated isn't necessarily positioned at the top of the
heap.

Consequently, we tend to accumulate more and more memory blocks that become
stagnant (in use cases where we initially allocate more memory than needed and
then realloc to the precise size). These blocks can't even be released back to
the system if their combined size exceeds the trim_threshold, since they don't
constitute a contiguous space at the top of the heap.

I maintain that the use of trim_threshold in this context is not ideal. We
might still want to retain the default threshold of 128K to prevent fragmenting
memory returned to the system, while also preventing the accumulation of
stagnant blocks up to 128K in our memory space.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug malloc/30579] trim_threshold in realloc lead to high memory usage
  2023-06-22 13:54 [Bug malloc/30579] New: trim_threshold in realloc lead to high memory usage nicolas at freedelity dot be
  2023-06-22 14:20 ` [Bug malloc/30579] " siddhesh at sourceware dot org
  2023-06-22 15:10 ` nicolas at freedelity dot be
@ 2023-06-28  7:56 ` nicolas at freedelity dot be
  2023-06-28 16:31 ` siddhesh at sourceware dot org
                   ` (4 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: nicolas at freedelity dot be @ 2023-06-28  7:56 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30579

--- Comment #3 from Nicolas Dusart <nicolas at freedelity dot be> ---
I'm concerned that this issue may have been considered resolved for you as it
may seem that it originated from a misunderstanding of my part.
But I have to stress out that this change may not be appropriate. It has
significantly impacted some of our processes which were not usable anymore and
a lot of debugging were necessary to spot the root cause.
I suspect it could similarly affect many other project using glibc. It may take
some time before production environment use new version of glibc, so I'd expect
these concern to pop out in a few month or years.
I believe we could both benefit from further discussion on this matter, and I
would greatly appreciate gaining insight into the reasoning behind this change.

With that in mind, I'd like to respectfully re-express my concerns about the
changes observed in the recent behavior of realloc:

 - First, as I've mentioned before, it appears that the new behavior will tend
to accumulate small blocks of less than `M_TRIM_THRESHOLD` bytes inside the
heap. Total size of all these blocks may largely exceed `M_TRIM_THRESHOLD`
bytes.

 - Secondly, this new behavior of realloc seems to be misaligned with the
documentation of M_TRIM_THRESHOLD. According to the documentation, the
"M_TRIM_THRESHOLD parameter specifies the minimum size (in bytes) that this
block of memory [at the top of the heap] must reach before sbrk is used to trim
the heap."
The current behavior, however, doesn't seem to be adhering to this definition.

 - I acknowledge that mallopt might be a solution to adapt the behavior
throughout the code based on specific needs. However, it seems to be a feasible
solution only in a single-threaded process. In a multi-threaded process, some
threads might benefit from a large M_TRIM_THRESHOLD while others might need a
smaller one. Unfortunately, mallopt doesn't seem to provide a solution in this
context.

These points bring me to the argument that this shift in behavior can be
considered a breaking change as it might have a significant impact on existing
processes, particularly those that rely on the current documentation of your
library.
Previously, we could consider that at maximum of `M_TRIM_THRESHOLD` unused
bytes could be reserved indefinitely to our process even if we tried to
deallocate this memory. That was a limit that can be controlled.
With the new behavior, the amount of unused memory reserved for our process is
unbounded. Furthermore, the unused slots cannot be reused by our process. This
implies that this space is lost, not just to the rest of the system, but also
to the process itself until the next free() call is made. In some processes,
this newly realloced space is meant to be kept indefinitely.

Could you provide some insight into the reasons why these arguments might not
be considered strong enough to avoid such a breaking change?

Even if the decision is to stick with the new behavior, it would seem necessary
to update the documentation to reflect these changes accurately. This would
ensure that developers are made aware of this new behavior and can take
necessary steps to mitigate any impact on their processes.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug malloc/30579] trim_threshold in realloc lead to high memory usage
  2023-06-22 13:54 [Bug malloc/30579] New: trim_threshold in realloc lead to high memory usage nicolas at freedelity dot be
                   ` (2 preceding siblings ...)
  2023-06-28  7:56 ` nicolas at freedelity dot be
@ 2023-06-28 16:31 ` siddhesh at sourceware dot org
  2023-07-06 15:38 ` cvs-commit at gcc dot gnu.org
                   ` (3 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: siddhesh at sourceware dot org @ 2023-06-28 16:31 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30579

--- Comment #4 from Siddhesh Poyarekar <siddhesh at sourceware dot org> ---
I suppose one middle ground could be to use the fastbin size to trigger
resizing and consolidation; it's a smaller threshold by default (64K). 
Alternatively, always resize the chunk (instead of an expensive free and alloc)
and consolidate the tail with neighbouring free chunk, if any.

It's probably going to be an involved change.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug malloc/30579] trim_threshold in realloc lead to high memory usage
  2023-06-22 13:54 [Bug malloc/30579] New: trim_threshold in realloc lead to high memory usage nicolas at freedelity dot be
                   ` (3 preceding siblings ...)
  2023-06-28 16:31 ` siddhesh at sourceware dot org
@ 2023-07-06 15:38 ` cvs-commit at gcc dot gnu.org
  2023-07-06 15:40 ` cvs-commit at gcc dot gnu.org
                   ` (2 subsequent siblings)
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-06 15:38 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30579

--- Comment #5 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The master branch has been updated by Siddhesh Poyarekar
<siddhesh@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=2fb12bbd092b0c10f1f2083216e723d2406e21c4

commit 2fb12bbd092b0c10f1f2083216e723d2406e21c4
Author: Siddhesh Poyarekar <siddhesh@sourceware.org>
Date:   Thu Jul 6 11:09:44 2023 -0400

    realloc: Limit chunk reuse to only growing requests [BZ #30579]

    The trim_threshold is too aggressive a heuristic to decide if chunk
    reuse is OK for reallocated memory; for repeated small, shrinking
    allocations it leads to internal fragmentation and for repeated larger
    allocations that fragmentation may blow up even worse due to the dynamic
    nature of the threshold.

    Limit reuse only when it is within the alignment padding, which is 2 *
    size_t for heap allocations and a page size for mmapped allocations.
    There's the added wrinkle of THP, but this fix ignores it for now,
    pessimizing that case in favor of keeping fragmentation low.

    This resolves BZ #30579.

    Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
    Reported-by: Nicolas Dusart <nicolas@freedelity.be>
    Reported-by: Aurelien Jarno <aurelien@aurel32.net>
    Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
    Tested-by: Aurelien Jarno <aurelien@aurel32.net>

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug malloc/30579] trim_threshold in realloc lead to high memory usage
  2023-06-22 13:54 [Bug malloc/30579] New: trim_threshold in realloc lead to high memory usage nicolas at freedelity dot be
                   ` (4 preceding siblings ...)
  2023-07-06 15:38 ` cvs-commit at gcc dot gnu.org
@ 2023-07-06 15:40 ` cvs-commit at gcc dot gnu.org
  2023-07-06 15:42 ` siddhesh at sourceware dot org
  2023-07-06 16:09 ` nicolas at freedelity dot be
  7 siblings, 0 replies; 9+ messages in thread
From: cvs-commit at gcc dot gnu.org @ 2023-07-06 15:40 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30579

--- Comment #6 from cvs-commit at gcc dot gnu.org <cvs-commit at gcc dot gnu.org> ---
The release/2.37/master branch has been updated by Siddhesh Poyarekar
<siddhesh@sourceware.org>:

https://sourceware.org/git/gitweb.cgi?p=glibc.git;h=0930ff8eb35cb493c945f176c3c9ab320f4d1b86

commit 0930ff8eb35cb493c945f176c3c9ab320f4d1b86
Author: Siddhesh Poyarekar <siddhesh@sourceware.org>
Date:   Thu Jul 6 11:09:44 2023 -0400

    realloc: Limit chunk reuse to only growing requests [BZ #30579]

    The trim_threshold is too aggressive a heuristic to decide if chunk
    reuse is OK for reallocated memory; for repeated small, shrinking
    allocations it leads to internal fragmentation and for repeated larger
    allocations that fragmentation may blow up even worse due to the dynamic
    nature of the threshold.

    Limit reuse only when it is within the alignment padding, which is 2 *
    size_t for heap allocations and a page size for mmapped allocations.
    There's the added wrinkle of THP, but this fix ignores it for now,
    pessimizing that case in favor of keeping fragmentation low.

    This resolves BZ #30579.

    Signed-off-by: Siddhesh Poyarekar <siddhesh@sourceware.org>
    Reported-by: Nicolas Dusart <nicolas@freedelity.be>
    Reported-by: Aurelien Jarno <aurelien@aurel32.net>
    Reviewed-by: Aurelien Jarno <aurelien@aurel32.net>
    Tested-by: Aurelien Jarno <aurelien@aurel32.net>
    (cherry picked from commit 2fb12bbd092b0c10f1f2083216e723d2406e21c4)

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug malloc/30579] trim_threshold in realloc lead to high memory usage
  2023-06-22 13:54 [Bug malloc/30579] New: trim_threshold in realloc lead to high memory usage nicolas at freedelity dot be
                   ` (5 preceding siblings ...)
  2023-07-06 15:40 ` cvs-commit at gcc dot gnu.org
@ 2023-07-06 15:42 ` siddhesh at sourceware dot org
  2023-07-06 16:09 ` nicolas at freedelity dot be
  7 siblings, 0 replies; 9+ messages in thread
From: siddhesh at sourceware dot org @ 2023-07-06 15:42 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30579

Siddhesh Poyarekar <siddhesh at sourceware dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|UNCONFIRMED                 |RESOLVED
         Resolution|---                         |FIXED
           Assignee|unassigned at sourceware dot org   |siddhesh at sourceware dot org
   Target Milestone|---                         |2.38

--- Comment #7 from Siddhesh Poyarekar <siddhesh at sourceware dot org> ---
Fixed in master and 2.37.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

* [Bug malloc/30579] trim_threshold in realloc lead to high memory usage
  2023-06-22 13:54 [Bug malloc/30579] New: trim_threshold in realloc lead to high memory usage nicolas at freedelity dot be
                   ` (6 preceding siblings ...)
  2023-07-06 15:42 ` siddhesh at sourceware dot org
@ 2023-07-06 16:09 ` nicolas at freedelity dot be
  7 siblings, 0 replies; 9+ messages in thread
From: nicolas at freedelity dot be @ 2023-07-06 16:09 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30579

--- Comment #8 from Nicolas Dusart <nicolas at freedelity dot be> ---
Thanks a lot for the fix in 2.37 branch !

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 9+ messages in thread

end of thread, other threads:[~2023-07-06 16:09 UTC | newest]

Thread overview: 9+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-06-22 13:54 [Bug malloc/30579] New: trim_threshold in realloc lead to high memory usage nicolas at freedelity dot be
2023-06-22 14:20 ` [Bug malloc/30579] " siddhesh at sourceware dot org
2023-06-22 15:10 ` nicolas at freedelity dot be
2023-06-28  7:56 ` nicolas at freedelity dot be
2023-06-28 16:31 ` siddhesh at sourceware dot org
2023-07-06 15:38 ` cvs-commit at gcc dot gnu.org
2023-07-06 15:40 ` cvs-commit at gcc dot gnu.org
2023-07-06 15:42 ` siddhesh at sourceware dot org
2023-07-06 16:09 ` nicolas at freedelity dot be

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).