public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed
* [Bug malloc/30625] New: Moving "free(buf)" slows down code x1.6
@ 2023-07-10 13:01 safinaskar at mail dot ru
  2023-07-11  7:37 ` [Bug malloc/30625] " fweimer at redhat dot com
                   ` (3 more replies)
  0 siblings, 4 replies; 5+ messages in thread
From: safinaskar at mail dot ru @ 2023-07-10 13:01 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30625

            Bug ID: 30625
           Summary: Moving "free(buf)" slows down code x1.6
           Product: glibc
           Version: 2.36
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: malloc
          Assignee: unassigned at sourceware dot org
          Reporter: safinaskar at mail dot ru
  Target Milestone: ---

Small adjustments to placement of malloc/free slows down code x1.6. It seems I
found some worst-case behavior in glibc's allocator.

Here is code:

====
#define HASH_VEC_ITEM_SIZE (3 * 8)
#define BUF_SIZE 4194304

#include <stdlib.h>
#include <stddef.h>
#include <string.h>

#define ASSERT(cond) do { if(!(cond))abort(); } while(0)

struct vec
{
  unsigned char *data;
  size_t size;
  size_t capacity; // capacity >= size
};

// We emulate rust's Vec::push
void
push (struct vec *v, size_t item_size, unsigned char *new_data)
{
  ASSERT(v->capacity >= v->size);
  if (v->size + item_size <= v->capacity)
    {
      memcpy(v->data + v->size, new_data, item_size);
      v->size += item_size;
      return;
    }
  v->capacity *= 2;
  if(v->capacity < v->size + item_size)
    {
      v->capacity = v->size + item_size;
    }
  v->data = realloc(v->data, v->capacity);
  memcpy(v->data + v->size, new_data, item_size);
  v->size += item_size;
  ASSERT(v->capacity >= v->size);
}

// To prevent optimization
//
https://boringssl.googlesource.com/boringssl/+/e38cf79cdf47606f6768fb85dc066d7ebce304ac/crypto/internal.h#281
void
black_box (unsigned char *arg) __attribute__((noinline));
void
black_box (unsigned char *arg)
{
  asm volatile("" : "+r"(arg) :);
  asm volatile("" : "+r"(arg[0]) :);
}

int
main ()
{
  struct vec hash_vec = { .data = malloc(HASH_VEC_ITEM_SIZE), .size = 0,
.capacity = HASH_VEC_ITEM_SIZE };
  for(int n = 0; n != 100; ++n)
    {
      unsigned char *buf = calloc(BUF_SIZE, 1);
      for(int i = 0; i != 5; ++i)
        {
          unsigned char *buf_clone = malloc(BUF_SIZE);
          memcpy(buf_clone, buf, BUF_SIZE);
          black_box(buf_clone);
          free(buf_clone);
        }
      calloc(2, 1); // We don't free this memory, we free everything else
      free(buf); //bad placement
      unsigned char new_item[HASH_VEC_ITEM_SIZE] = {0};
      push(&hash_vec, HASH_VEC_ITEM_SIZE, new_item);
      //free(buf); //good placement
    }
  free(hash_vec.data);
}
====

Compile so: "gcc -O3 -o /tmp/t /tmp/t.c"

"/lib64/ld-linux-x86-64.so.2 --version" output:

====
ld.so (Debian GLIBC 2.36-9) stable release version 2.36.
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
====

gcc is 12.3.0

Everything happens in debian sid x86_64 in docker container in Linux 5.10.0

When "free(buf)" placed in "good placement" the code above runs 0.17 s, in "bad
placement" - 0.28 s (i. e. x1.6 slower)

This bug was triggered in absolutely real production case. Here is context:
https://github.com/rust-lang/rust/issues/113504 . @saethlin reduced the example
further here:
https://github.com/rust-lang/rust/issues/113504#issuecomment-1627852074 and
then I reduced it even more to C language in this (glibc) bug report.

So, small adjustments to "free" placement slows down code significantly, I
think this is a bug. @saethlin also adds: "Based on profiling, the difference
seems attributable to a 44x (!!) difference in the number of page faults
between the two implementations. If I swap in jemalloc or mimalloc, the
difference in runtime and page faults goes away. So I strongly suspect that
this code is generating some worst-case behavior in glibc's allocator"

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug malloc/30625] Moving "free(buf)" slows down code x1.6
  2023-07-10 13:01 [Bug malloc/30625] New: Moving "free(buf)" slows down code x1.6 safinaskar at mail dot ru
@ 2023-07-11  7:37 ` fweimer at redhat dot com
  2023-07-11 11:21 ` safinaskar at mail dot ru
                   ` (2 subsequent siblings)
  3 siblings, 0 replies; 5+ messages in thread
From: fweimer at redhat dot com @ 2023-07-11  7:37 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30625

Florian Weimer <fweimer at redhat dot com> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |fweimer at redhat dot com
              Flags|                            |security-

--- Comment #1 from Florian Weimer <fweimer at redhat dot com> ---
The bad placement results in many brk calls (about three per loop iteration).
It's still present in the current sources.

The trimming heuristics are not unreasonable in this case, given how large the
BUF_SIZE allocation is. What you call “good placement” prevents heap trimming;
others have complained that we don't release memory in the middle of the heap
to the operating system, and they would call this “bad placement”.

We just don't know if the program will go to sleep after the free call, or
enter some longer computation not calling malloc/free, so if free detects that
there is a certain amount of memory that could be returned to the kernel, it
does that.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug malloc/30625] Moving "free(buf)" slows down code x1.6
  2023-07-10 13:01 [Bug malloc/30625] New: Moving "free(buf)" slows down code x1.6 safinaskar at mail dot ru
  2023-07-11  7:37 ` [Bug malloc/30625] " fweimer at redhat dot com
@ 2023-07-11 11:21 ` safinaskar at mail dot ru
  2023-07-11 12:15 ` fweimer at redhat dot com
  2024-01-14 18:58 ` sam at gentoo dot org
  3 siblings, 0 replies; 5+ messages in thread
From: safinaskar at mail dot ru @ 2023-07-11 11:21 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30625

--- Comment #2 from Askar Safin <safinaskar at mail dot ru> ---
Please, return memory to OS only if 2 following conditions hold in the same
time:
- Allocated memory is significantly smaller than latest peak
- Enough time passed from latest peak

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug malloc/30625] Moving "free(buf)" slows down code x1.6
  2023-07-10 13:01 [Bug malloc/30625] New: Moving "free(buf)" slows down code x1.6 safinaskar at mail dot ru
  2023-07-11  7:37 ` [Bug malloc/30625] " fweimer at redhat dot com
  2023-07-11 11:21 ` safinaskar at mail dot ru
@ 2023-07-11 12:15 ` fweimer at redhat dot com
  2024-01-14 18:58 ` sam at gentoo dot org
  3 siblings, 0 replies; 5+ messages in thread
From: fweimer at redhat dot com @ 2023-07-11 12:15 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30625

--- Comment #3 from Florian Weimer <fweimer at redhat dot com> ---
(In reply to Askar Safin from comment #2)
> Please, return memory to OS only if 2 following conditions hold in the same
> time:
> - Allocated memory is significantly smaller than latest peak

I think this is true in both cases in the reproducer?  Doesn't the memory drop
by one third of the peak heap size (twice)?

> - Enough time passed from latest peak

That can cause a lot of memory be left behind if the program goes to sleep, as
I tried to explain. In a multi-threaded program, we could perhaps reclaim the
memory from a service thread, but in a single-threaded program this isn't
possible.

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

* [Bug malloc/30625] Moving "free(buf)" slows down code x1.6
  2023-07-10 13:01 [Bug malloc/30625] New: Moving "free(buf)" slows down code x1.6 safinaskar at mail dot ru
                   ` (2 preceding siblings ...)
  2023-07-11 12:15 ` fweimer at redhat dot com
@ 2024-01-14 18:58 ` sam at gentoo dot org
  3 siblings, 0 replies; 5+ messages in thread
From: sam at gentoo dot org @ 2024-01-14 18:58 UTC (permalink / raw)
  To: glibc-bugs

https://sourceware.org/bugzilla/show_bug.cgi?id=30625

Sam James <sam at gentoo dot org> changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
                 CC|                            |sam at gentoo dot org

-- 
You are receiving this mail because:
You are on the CC list for the bug.

^ permalink raw reply	[flat|nested] 5+ messages in thread

end of thread, other threads:[~2024-01-14 18:58 UTC | newest]

Thread overview: 5+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2023-07-10 13:01 [Bug malloc/30625] New: Moving "free(buf)" slows down code x1.6 safinaskar at mail dot ru
2023-07-11  7:37 ` [Bug malloc/30625] " fweimer at redhat dot com
2023-07-11 11:21 ` safinaskar at mail dot ru
2023-07-11 12:15 ` fweimer at redhat dot com
2024-01-14 18:58 ` sam at gentoo dot org

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).