[Bug malloc/26969] A common malloc pattern can make memory not given back to OS

public inbox for glibc-bugs@sourceware.org
help / color / mirror / Atom feed

From: "keyid.w at qq dot com" <sourceware-bugzilla@sourceware.org>
To: glibc-bugs@sourceware.org
Subject: [Bug malloc/26969] A common malloc pattern can make memory not given back to OS
Date: Tue, 01 Dec 2020 08:43:30 +0000	[thread overview]
Message-ID: <bug-26969-131-3sIrEluRom@http.sourceware.org/bugzilla/> (raw)
In-Reply-To: <bug-26969-131@http.sourceware.org/bugzilla/>

https://sourceware.org/bugzilla/show_bug.cgi?id=26969

keyid.w at qq dot com changed:

           What    |Removed                     |Added
----------------------------------------------------------------------------
             Status|RESOLVED                    |UNCONFIRMED
         Resolution|NOTABUG                     |---

--- Comment #2 from keyid.w at qq dot com ---
(In reply to Carlos O'Donell from comment #1)
> The glibc implementation of malloc is a heap-based allocator and in that
> design the heap must be logically freed back down in the order that it was
> originally allocated or the heap will continue to grow to keep a maximum
> working set of chunks for application.
> 
> If you want to free back down to zero at the last deallocation you must tune
> the allocator by disabling fastbins and tcache.
> 
> For example:
> - Allocate A
> - Allocate B
> - Allocate C
> - Free A
> - Free B
> - Free C
> 
> Consider A, B and C are all the same size.
> 
> Until "Free C" happens the entire stack is held at 3 objects deep.
> 
> This can happen because tcache or fastbins holds the most recently freed
> chunk for re-use. There is nothing wrong with this strategy because the C
> library, apriori, doesn't know if you'll carry out this entire workload
> again.
> 
> The worse-case degenerate situation for tcache is a sequence of allocations
> which cause tcache to always hold the top-of-heap chunks as in-use. In a
> real program those chunks are refilled into the tcache much more randomly
> via malloc from the unsorted bin or small bin refill strategy. Thus tcache
> should not keep the top-of-heap from freeing down in those cases. It's only
> in synthetic test cases like this where I think you see tcache being the
> blocker to freeing down from the top of heap.
> 
> If you need to free pages between workloads and while idle you can call
> malloc_trim() to release page-sized consolidated parts of the heaps.
> 
> If you need a minimal working set, then you need to turn off fastbins and
> tcache.
> 
> One possible enhancement we can make is to split the heaps by pool sizes,
> and that's something I've talked about a bit with DJ Delorie. As it stands
> though that would be a distinct enhancement.
> 
> I'm marking this as RESOLVED/NOTABUG since the algorithm is working as
> intended but doesn't meet your specific synthetic workload. If you have a
> real non-synthetic workload that exhibits problems please open a bug and we
> can talk about it and review performance and capture an API trace.

Thanks for your reply! I indeed faced this problem in a real workload. I tried
to simplify my code, however, the final code is still a little complex and is
in C++. The code is attached at the end.

There's a thread-queue that execute n tasks with m worker threads. Each task
stores some calculated (field, value) data into a map. In my real workload, I
calculate some double numbers from some loaded data then I store the double
numbers into map. The calculation process is very complex so I simplified
here.I think creating the map's key(a short string) is similar to malloc-ing
small pieces and creating the map's value(a large vector) is similar to
malloc-ing large pieces. However, if I don't use the thread-queue, the memory
will be released. So I guess some malloc of something in STL in the thread
queue compound the result. In fact I used gdb to check the content of
tcache/fast bins near the heap top and found they were probably allocated to
something in STL. Also, If I comment the 149th line("return dp;") and
un-comment the 147th and 148th lines, the memory will be released. I don't know
why.

You can compile it just using "g++ test.cpp -o test -lpthread" and run it with
"./test task_number thread_number" .

#include <cstdio>
#include <thread>
#include <mutex>
#include <map>
#include <future>
#include <queue>
#include <memory>
#include <string>
#include <list>
#include <functional>
#include <utility>

using namespace std;

class TestClass {
 public:
  void DoSomething() {
    const int Count = 10000;

    map <string,vector<double>> values;

    for (int i = 0; i < Count; ++i) {
      vector <double> v(10000);
      values[std::to_string(i)] = v;
    }
  }
};

class MultiThreadWorkQueue {
 public:
  // Constructor.
  // cache_size is the maximum capicity of result cache.
  // n_threads is number of worker threads. If it is 0, it will be set to
  // a reasonable value based on number of cores.
  // If cache_size is 0, it will be set to n_threads.
  MultiThreadWorkQueue(int cache_size, int n_threads)
    : cache_size_(cache_size),
      n_threads_(n_threads) {
    for (int i = 0; i < n_threads_; ++i) {
      workers_.push_back(std::thread(&MultiThreadWorkQueue::ProcessTasks,
this));
    }
  }

  ~MultiThreadWorkQueue() {
    Abort();
  }

  void Enqueue(std::function<TestClass*()>&& func) {
    {
      std::unique_lock<std::mutex> ul(tasks_mutex_);
      tasks_.emplace(std::forward<std::function<TestClass*()>> (func));
    }

    worker_cv_.notify_one();
  }

  // Gets result from the next task in queue. If it's still pending, block the
current
  // thread and wait until the result is available.
  //
  // Noted that if this is called after Abort(), it will crash.
  TestClass* Dequeue() {
    std::unique_lock<std::mutex> ul(tasks_mutex_);

    dequeue_cv_.wait(ul, [this] {
      return aborted_ || returns_.size() > 0;
    });

    std::future<TestClass*> future = std::move(returns_.front());
    returns_.pop();

    ul.unlock();

    worker_cv_.notify_one();
    return future.get();
  }

  // Stop executing any new tasks and join all the worker threads.
  void Abort() {
    {
      std::unique_lock<std::mutex> ul(tasks_mutex_);
      if (aborted_) {
        return;
      } else {
        aborted_ = true;
      }
    }

    worker_cv_.notify_all();
    dequeue_cv_.notify_all();

    for (auto& thread : workers_) {
      thread.join();
    }
  }

  // Size = N(Enqueue) - N(Dequeue).
  size_t Size() {
    std::unique_lock<std::mutex> ul(tasks_mutex_);
    return returns_.size() + tasks_.size();
  }

 private:
  void ProcessTasks() {
    std::unique_lock<std::mutex> ul(tasks_mutex_);

    while (!aborted_) {
      worker_cv_.wait(ul, [this]() {
        return aborted_ || (tasks_.size() > 0 && returns_.size() <
cache_size_);
      });

      if (aborted_) {
        break;
      }

      std::packaged_task<TestClass*()> t;
      t.swap(tasks_.front());
      tasks_.pop();

      returns_.emplace(t.get_future());

      ul.unlock();
      dequeue_cv_.notify_one();
      t();
      ul.lock();
    }
  }

  std::mutex tasks_mutex_;

  std::atomic<bool> aborted_ { false };
  int cache_size_;
  int n_threads_;
  std::condition_variable worker_cv_;
  std::condition_variable dequeue_cv_;
  std::queue<std::packaged_task<TestClass*()>> tasks_;
  std::queue<std::future<TestClass*>> returns_;
  std::list<std::thread> workers_;
};

int main(int argc, char** argv) {
  int n = atoi(argv[1]);
  int thread_num = atoi(argv[2]);

  auto CreateDP = [] {
    TestClass* dp = new TestClass;
    dp->DoSomething();
    // delete dp;
    // return nullptr;
    return dp;
  };

  printf("* before run, press enter to continue");
  fflush(stdout);
  std::getchar();

  if (thread_num > 0) {
    printf("Multi-thread\n");
    MultiThreadWorkQueue work_queue(10, thread_num);
    for (int i = 0; i < n; ++i) {
      work_queue.Enqueue(CreateDP);
    }
    for (int i = 0; i < n; ++i) {
      std::unique_ptr<TestClass> dp(work_queue.Dequeue());
    }
  } else {
    printf("Single-thread\n");
    for (int i = 0; i < n; ++i) {
      fflush(stdout);
      std::unique_ptr<TestClass> dp(CreateDP());
    }
  }
  printf("* after run, press enter to continue");
  fflush(stdout);
  std::getchar();
  return 0;
}

-- 
You are receiving this mail because:
You are on the CC list for the bug.

next prev parent reply	other threads:[~2020-12-01  8:43 UTC|newest]

Thread overview: 9+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2020-11-28 13:58 [Bug malloc/26969] New: " keyid.w at qq dot com
2020-11-28 13:59 ` [Bug malloc/26969] " keyid.w at qq dot com
2020-11-29  3:24 ` keyid.w at qq dot com
2020-12-01  1:03 ` uwydoc at gmail dot com
2020-12-01  2:51 ` carlos at redhat dot com
2020-12-01  8:43 ` keyid.w at qq dot com [this message]
2021-01-29 16:08 ` dimahabr at gmail dot com
2021-02-01  8:52 ` keyid.w at qq dot com
2022-06-29 16:34 ` romash at rbbn dot com

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=bug-26969-131-3sIrEluRom@http.sourceware.org/bugzilla/ \
    --to=sourceware-bugzilla@sourceware.org \
    --cc=glibc-bugs@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link

Be sure your reply has a Subject: header at the top and a blank line before the message body.

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).