From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <sourceware-bugzilla@sourceware.org>
Received: by sourceware.org (Postfix, from userid 48)
	id 1CE293857704; Mon, 10 Jul 2023 13:01:06 +0000 (GMT)
DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1CE293857704
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org;
	s=default; t=1688994066;
	bh=XEoMIhakh9ziBHKGg5IkIWIrT7g1aUFtIptRJDe7UXU=;
	h=From:To:Subject:Date:From;
	b=WxPAKgkkvI9L/0BntYShGzrLkVvNzNDVHhQgsyJriwY2t+PJGLOYdmDTrgzxIyuDp
	 3m+8Jnmf6Yxn3EZx83XnVnNhvAlE7Vq8m6P5ss1XFp0mSC38U/w1EKECK6nk5/KsvJ
	 JqLiVZTtFm/5epwMKIYmsFGSdU1xT+RjopdjUrQU=
From: "safinaskar at mail dot ru" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug malloc/30625] New: Moving "free(buf)" slows down code x1.6
Date: Mon, 10 Jul 2023 13:01:05 +0000
X-Bugzilla-Reason: CC
X-Bugzilla-Type: new
X-Bugzilla-Watch-Reason: None
X-Bugzilla-Product: glibc
X-Bugzilla-Component: malloc
X-Bugzilla-Version: 2.36
X-Bugzilla-Keywords: 
X-Bugzilla-Severity: normal
X-Bugzilla-Who: safinaskar at mail dot ru
X-Bugzilla-Status: UNCONFIRMED
X-Bugzilla-Resolution: 
X-Bugzilla-Priority: P2
X-Bugzilla-Assigned-To: unassigned at sourceware dot org
X-Bugzilla-Target-Milestone: ---
X-Bugzilla-Flags: 
X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status
 bug_severity priority component assigned_to reporter target_milestone
Message-ID: <bug-30625-131@http.sourceware.org/bugzilla/>
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Bugzilla-URL: http://sourceware.org/bugzilla/
Auto-Submitted: auto-generated
MIME-Version: 1.0
List-Id: <glibc-bugs.sourceware.org>

https://sourceware.org/bugzilla/show_bug.cgi?id=3D30625

            Bug ID: 30625
           Summary: Moving "free(buf)" slows down code x1.6
           Product: glibc
           Version: 2.36
            Status: UNCONFIRMED
          Severity: normal
          Priority: P2
         Component: malloc
          Assignee: unassigned at sourceware dot org
          Reporter: safinaskar at mail dot ru
  Target Milestone: ---

Small adjustments to placement of malloc/free slows down code x1.6. It seem=
s I
found some worst-case behavior in glibc's allocator.

Here is code:

=3D=3D=3D=3D
#define HASH_VEC_ITEM_SIZE (3 * 8)
#define BUF_SIZE 4194304

#include <stdlib.h>
#include <stddef.h>
#include <string.h>

#define ASSERT(cond) do { if(!(cond))abort(); } while(0)

struct vec
{
  unsigned char *data;
  size_t size;
  size_t capacity; // capacity >=3D size
};

// We emulate rust's Vec::push
void
push (struct vec *v, size_t item_size, unsigned char *new_data)
{
  ASSERT(v->capacity >=3D v->size);
  if (v->size + item_size <=3D v->capacity)
    {
      memcpy(v->data + v->size, new_data, item_size);
      v->size +=3D item_size;
      return;
    }
  v->capacity *=3D 2;
  if(v->capacity < v->size + item_size)
    {
      v->capacity =3D v->size + item_size;
    }
  v->data =3D realloc(v->data, v->capacity);
  memcpy(v->data + v->size, new_data, item_size);
  v->size +=3D item_size;
  ASSERT(v->capacity >=3D v->size);
}

// To prevent optimization
//
https://boringssl.googlesource.com/boringssl/+/e38cf79cdf47606f6768fb85dc06=
6d7ebce304ac/crypto/internal.h#281
void
black_box (unsigned char *arg) __attribute__((noinline));
void
black_box (unsigned char *arg)
{
  asm volatile("" : "+r"(arg) :);
  asm volatile("" : "+r"(arg[0]) :);
}

int
main ()
{
  struct vec hash_vec =3D { .data =3D malloc(HASH_VEC_ITEM_SIZE), .size =3D=
 0,
.capacity =3D HASH_VEC_ITEM_SIZE };
  for(int n =3D 0; n !=3D 100; ++n)
    {
      unsigned char *buf =3D calloc(BUF_SIZE, 1);
      for(int i =3D 0; i !=3D 5; ++i)
        {
          unsigned char *buf_clone =3D malloc(BUF_SIZE);
          memcpy(buf_clone, buf, BUF_SIZE);
          black_box(buf_clone);
          free(buf_clone);
        }
      calloc(2, 1); // We don't free this memory, we free everything else
      free(buf); //bad placement
      unsigned char new_item[HASH_VEC_ITEM_SIZE] =3D {0};
      push(&hash_vec, HASH_VEC_ITEM_SIZE, new_item);
      //free(buf); //good placement
    }
  free(hash_vec.data);
}
=3D=3D=3D=3D

Compile so: "gcc -O3 -o /tmp/t /tmp/t.c"

"/lib64/ld-linux-x86-64.so.2 --version" output:

=3D=3D=3D=3D
ld.so (Debian GLIBC 2.36-9) stable release version 2.36.
Copyright (C) 2022 Free Software Foundation, Inc.
This is free software; see the source for copying conditions.
There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A
PARTICULAR PURPOSE.
=3D=3D=3D=3D

gcc is 12.3.0

Everything happens in debian sid x86_64 in docker container in Linux 5.10.0

When "free(buf)" placed in "good placement" the code above runs 0.17 s, in =
"bad
placement" - 0.28 s (i. e. x1.6 slower)

This bug was triggered in absolutely real production case. Here is context:
https://github.com/rust-lang/rust/issues/113504 . @saethlin reduced the exa=
mple
further here:
https://github.com/rust-lang/rust/issues/113504#issuecomment-1627852074 and
then I reduced it even more to C language in this (glibc) bug report.

So, small adjustments to "free" placement slows down code significantly, I
think this is a bug. @saethlin also adds: "Based on profiling, the differen=
ce
seems attributable to a 44x (!!) difference in the number of page faults
between the two implementations. If I swap in jemalloc or mimalloc, the
difference in runtime and page faults goes away. So I strongly suspect that
this code is generating some worst-case behavior in glibc's allocator"

--=20
You are receiving this mail because:
You are on the CC list for the bug.=