From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: by sourceware.org (Postfix, from userid 48) id 6041C3896C1D; Tue, 1 Dec 2020 08:43:30 +0000 (GMT) DKIM-Filter: OpenDKIM Filter v2.11.0 sourceware.org 6041C3896C1D From: "keyid.w at qq dot com" To: glibc-bugs@sourceware.org Subject: [Bug malloc/26969] A common malloc pattern can make memory not given back to OS Date: Tue, 01 Dec 2020 08:43:30 +0000 X-Bugzilla-Reason: CC X-Bugzilla-Type: changed X-Bugzilla-Watch-Reason: None X-Bugzilla-Product: glibc X-Bugzilla-Component: malloc X-Bugzilla-Version: 2.27 X-Bugzilla-Keywords: X-Bugzilla-Severity: minor X-Bugzilla-Who: keyid.w at qq dot com X-Bugzilla-Status: UNCONFIRMED X-Bugzilla-Resolution: X-Bugzilla-Priority: P1 X-Bugzilla-Assigned-To: unassigned at sourceware dot org X-Bugzilla-Target-Milestone: --- X-Bugzilla-Flags: X-Bugzilla-Changed-Fields: bug_status resolution Message-ID: In-Reply-To: References: Content-Type: text/plain; charset="UTF-8" Content-Transfer-Encoding: quoted-printable X-Bugzilla-URL: http://sourceware.org/bugzilla/ Auto-Submitted: auto-generated MIME-Version: 1.0 X-BeenThere: glibc-bugs@sourceware.org X-Mailman-Version: 2.1.29 Precedence: list List-Id: Glibc-bugs mailing list List-Unsubscribe: , List-Archive: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 01 Dec 2020 08:43:30 -0000 https://sourceware.org/bugzilla/show_bug.cgi?id=3D26969 keyid.w at qq dot com changed: What |Removed |Added ---------------------------------------------------------------------------- Status|RESOLVED |UNCONFIRMED Resolution|NOTABUG |--- --- Comment #2 from keyid.w at qq dot com --- (In reply to Carlos O'Donell from comment #1) > The glibc implementation of malloc is a heap-based allocator and in that > design the heap must be logically freed back down in the order that it was > originally allocated or the heap will continue to grow to keep a maximum > working set of chunks for application. >=20 > If you want to free back down to zero at the last deallocation you must t= une > the allocator by disabling fastbins and tcache. >=20 > For example: > - Allocate A > - Allocate B > - Allocate C > - Free A > - Free B > - Free C >=20 > Consider A, B and C are all the same size. >=20 > Until "Free C" happens the entire stack is held at 3 objects deep. >=20 > This can happen because tcache or fastbins holds the most recently freed > chunk for re-use. There is nothing wrong with this strategy because the C > library, apriori, doesn't know if you'll carry out this entire workload > again. >=20 > The worse-case degenerate situation for tcache is a sequence of allocatio= ns > which cause tcache to always hold the top-of-heap chunks as in-use. In a > real program those chunks are refilled into the tcache much more randomly > via malloc from the unsorted bin or small bin refill strategy. Thus tcache > should not keep the top-of-heap from freeing down in those cases. It's on= ly > in synthetic test cases like this where I think you see tcache being the > blocker to freeing down from the top of heap. >=20 > If you need to free pages between workloads and while idle you can call > malloc_trim() to release page-sized consolidated parts of the heaps. >=20 > If you need a minimal working set, then you need to turn off fastbins and > tcache. >=20 > One possible enhancement we can make is to split the heaps by pool sizes, > and that's something I've talked about a bit with DJ Delorie. As it stands > though that would be a distinct enhancement. >=20 > I'm marking this as RESOLVED/NOTABUG since the algorithm is working as > intended but doesn't meet your specific synthetic workload. If you have a > real non-synthetic workload that exhibits problems please open a bug and = we > can talk about it and review performance and capture an API trace. Thanks for your reply! I indeed faced this problem in a real workload. I tr= ied to simplify my code, however, the final code is still a little complex and = is in C++. The code is attached at the end. There's a thread-queue that execute n tasks with m worker threads. Each task stores some calculated (field, value) data into a map. In my real workload,= I calculate some double numbers from some loaded data then I store the double numbers into map. The calculation process is very complex so I simplified here.I think creating the map's key(a short string) is similar to malloc-ing small pieces and creating the map's value(a large vector) is similar to malloc-ing large pieces. However, if I don't use the thread-queue, the memo= ry will be released. So I guess some malloc of something in STL in the thread queue compound the result. In fact I used gdb to check the content of tcache/fast bins near the heap top and found they were probably allocated to something in STL. Also, If I comment the 149th line("return dp;") and un-comment the 147th and 148th lines, the memory will be released. I don't = know why. You can compile it just using "g++ test.cpp -o test -lpthread" and run it w= ith "./test task_number thread_number" . #include #include #include #include #include #include #include #include #include #include #include using namespace std; class TestClass { public: void DoSomething() { const int Count =3D 10000; map > values; for (int i =3D 0; i < Count; ++i) { vector v(10000); values[std::to_string(i)] =3D v; } } }; class MultiThreadWorkQueue { public: // Constructor. // cache_size is the maximum capicity of result cache. // n_threads is number of worker threads. If it is 0, it will be set to // a reasonable value based on number of cores. // If cache_size is 0, it will be set to n_threads. MultiThreadWorkQueue(int cache_size, int n_threads) : cache_size_(cache_size), n_threads_(n_threads) { for (int i =3D 0; i < n_threads_; ++i) { workers_.push_back(std::thread(&MultiThreadWorkQueue::ProcessTasks, this)); } } ~MultiThreadWorkQueue() { Abort(); } void Enqueue(std::function&& func) { { std::unique_lock ul(tasks_mutex_); tasks_.emplace(std::forward> (func)); } worker_cv_.notify_one(); } // Gets result from the next task in queue. If it's still pending, block = the current // thread and wait until the result is available. // // Noted that if this is called after Abort(), it will crash. TestClass* Dequeue() { std::unique_lock ul(tasks_mutex_); dequeue_cv_.wait(ul, [this] { return aborted_ || returns_.size() > 0; }); std::future future =3D std::move(returns_.front()); returns_.pop(); ul.unlock(); worker_cv_.notify_one(); return future.get(); } // Stop executing any new tasks and join all the worker threads. void Abort() { { std::unique_lock ul(tasks_mutex_); if (aborted_) { return; } else { aborted_ =3D true; } } worker_cv_.notify_all(); dequeue_cv_.notify_all(); for (auto& thread : workers_) { thread.join(); } } // Size =3D N(Enqueue) - N(Dequeue). size_t Size() { std::unique_lock ul(tasks_mutex_); return returns_.size() + tasks_.size(); } private: void ProcessTasks() { std::unique_lock ul(tasks_mutex_); while (!aborted_) { worker_cv_.wait(ul, [this]() { return aborted_ || (tasks_.size() > 0 && returns_.size() < cache_size_); }); if (aborted_) { break; } std::packaged_task t; t.swap(tasks_.front()); tasks_.pop(); returns_.emplace(t.get_future()); ul.unlock(); dequeue_cv_.notify_one(); t(); ul.lock(); } } std::mutex tasks_mutex_; std::atomic aborted_ { false }; int cache_size_; int n_threads_; std::condition_variable worker_cv_; std::condition_variable dequeue_cv_; std::queue> tasks_; std::queue> returns_; std::list workers_; }; int main(int argc, char** argv) { int n =3D atoi(argv[1]); int thread_num =3D atoi(argv[2]); auto CreateDP =3D [] { TestClass* dp =3D new TestClass; dp->DoSomething(); // delete dp; // return nullptr; return dp; }; printf("* before run, press enter to continue"); fflush(stdout); std::getchar(); if (thread_num > 0) { printf("Multi-thread\n"); MultiThreadWorkQueue work_queue(10, thread_num); for (int i =3D 0; i < n; ++i) { work_queue.Enqueue(CreateDP); } for (int i =3D 0; i < n; ++i) { std::unique_ptr dp(work_queue.Dequeue()); } } else { printf("Single-thread\n"); for (int i =3D 0; i < n; ++i) { fflush(stdout); std::unique_ptr dp(CreateDP()); } } printf("* after run, press enter to continue"); fflush(stdout); std::getchar(); return 0; } --=20 You are receiving this mail because: You are on the CC list for the bug.=