From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 1CE293857704; Mon, 10 Jul 2023 13:01:06 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 1CE293857704 DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=sourceware.org; s=default; t=1688994066; bh=XEoMIhakh9ziBHKGg5IkIWIrT7g1aUFtIptRJDe7UXU=; h=From:To:Subject:Date:From; b=WxPAKgkkvI9L/0BntYShGzrLkVvNzNDVHhQgsyJriwY2t+PJGLOYdmDTrgzxIyuDp 3m+8Jnmf6Yxn3EZx83XnVnNhvAlE7Vq8m6P5ss1XFp0mSC38U/w1EKECK6nk5/KsvJ JqLiVZTtFm/5epwMKIYmsFGSdU1xT+RjopdjUrQU= From: "safinaskar at mail dot ru" To: glibc-bugs@sourceware.org Subject: [Bug malloc/30625] New: Moving "free(buf)" slows down code x1.6 Date: Mon, 10 Jul 2023 13:01:05 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: new X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: malloc X-Bugzilla-Version: 2.36 X-Bugzilla-Keywords: X-Bugzilla-Severity: normal X-Bugzilla-Who: safinaskar at mail dot ru X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P2 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_id short_desc product version bug_status bug_severity priority component assigned_to reporter target_milestone Message-ID: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 List-Id: https://sourceware.org/bugzilla/show_bug.cgi?id=3D30625 Bug ID: 30625 Summary: Moving "free(buf)" slows down code x1.6 Product: glibc Version: 2.36 Status: UNCONFIRMED Severity: normal Priority: P2 Component: malloc Assignee: unassigned at sourceware dot org Reporter: safinaskar at mail dot ru Target Milestone: --- Small adjustments to placement of malloc/free slows down code x1.6. It seem= s I found some worst-case behavior in glibc's allocator. Here is code: =3D=3D=3D=3D #define HASH_VEC_ITEM_SIZE (3 * 8) #define BUF_SIZE 4194304 #include #include #include #define ASSERT(cond) do { if(!(cond))abort(); } while(0) struct vec { unsigned char *data; size_t size; size_t capacity; // capacity >=3D size }; // We emulate rust's Vec::push void push (struct vec *v, size_t item_size, unsigned char *new_data) { ASSERT(v->capacity >=3D v->size); if (v->size + item_size <=3D v->capacity) { memcpy(v->data + v->size, new_data, item_size); v->size +=3D item_size; return; } v->capacity *=3D 2; if(v->capacity < v->size + item_size) { v->capacity =3D v->size + item_size; } v->data =3D realloc(v->data, v->capacity); memcpy(v->data + v->size, new_data, item_size); v->size +=3D item_size; ASSERT(v->capacity >=3D v->size); } // To prevent optimization // https://boringssl.googlesource.com/boringssl/+/e38cf79cdf47606f6768fb85dc06= 6d7ebce304ac/crypto/internal.h#281 void black_box (unsigned char *arg) __attribute__((noinline)); void black_box (unsigned char *arg) { asm volatile("" : "+r"(arg) :); asm volatile("" : "+r"(arg[0]) :); } int main () { struct vec hash_vec =3D { .data =3D malloc(HASH_VEC_ITEM_SIZE), .size =3D= 0, .capacity =3D HASH_VEC_ITEM_SIZE }; for(int n =3D 0; n !=3D 100; ++n) { unsigned char *buf =3D calloc(BUF_SIZE, 1); for(int i =3D 0; i !=3D 5; ++i) { unsigned char *buf_clone =3D malloc(BUF_SIZE); memcpy(buf_clone, buf, BUF_SIZE); black_box(buf_clone); free(buf_clone); } calloc(2, 1); // We don't free this memory, we free everything else free(buf); //bad placement unsigned char new_item[HASH_VEC_ITEM_SIZE] =3D {0}; push(&hash_vec, HASH_VEC_ITEM_SIZE, new_item); //free(buf); //good placement } free(hash_vec.data); } =3D=3D=3D=3D Compile so: "gcc -O3 -o /tmp/t /tmp/t.c" "/lib64/ld-linux-x86-64.so.2 --version" output: =3D=3D=3D=3D ld.so (Debian GLIBC 2.36-9) stable release version 2.36. Copyright (C) 2022 Free Software Foundation, Inc. This is free software; see the source for copying conditions. There is NO warranty; not even for MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. =3D=3D=3D=3D gcc is 12.3.0 Everything happens in debian sid x86_64 in docker container in Linux 5.10.0 When "free(buf)" placed in "good placement" the code above runs 0.17 s, in = "bad placement" - 0.28 s (i. e. x1.6 slower) This bug was triggered in absolutely real production case. Here is context: https://github.com/rust-lang/rust/issues/113504 . @saethlin reduced the exa= mple further here: https://github.com/rust-lang/rust/issues/113504#issuecomment-1627852074 and then I reduced it even more to C language in this (glibc) bug report. So, small adjustments to "free" placement slows down code significantly, I think this is a bug. @saethlin also adds: "Based on profiling, the differen= ce seems attributable to a 44x (!!) difference in the number of page faults between the two implementations. If I swap in jemalloc or mimalloc, the difference in runtime and page faults goes away. So I strongly suspect that this code is generating some worst-case behavior in glibc's allocator" --=20 You are receiving this mail because: You are on the CC list for the bug.=