From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 76322 invoked by alias); 22 Mar 2019 14:53:10 -0000 Mailing-List: contact libc-alpha-help@sourceware.org; run by ezmlm Precedence: bulk List-Id: List-Subscribe: List-Archive: List-Post: List-Help: , Sender: libc-alpha-owner@sourceware.org Received: (qmail 76311 invoked by uid 89); 22 Mar 2019 14:53:09 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-1.9 required=5.0 tests=BAYES_00,FREEMAIL_FROM,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.1 spammy=apps, upgraded, amazon, proxy X-HELO: mail-yw1-f49.google.com DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=gmail.com; s=20161025; h=sender:date:from:to:subject:message-id:mime-version :content-transfer-encoding; bh=cTPp5X3j8NhnOkU1ldzbtIE3kZCB+GioNQsphYqM+vM=; b=tkfHgT+HG3g1wRAkfZXwYQeNDLgiRF0mK1sDhs8J99wAKskSExDzJNKuvNK5yULWG0 AEScygmZ7yC7LWphPJ8ooTHmHaH7Drj19ONwHFBzjA2nVBo794/Lqgw1jKXwGjSUyvLJ xoqxf2QIbxnRBc3cDO3SO5TJdL4c3zHAakCqua3NaSSHLnolrwKk77TFl2kXVmXkSzJv gh7dXjfq9vDlyo2vTaj5xXS1MS58e9pZ9YmfykGTTlIHJEcmc0wXtKC7ubX5J/yDZpPN Un6k+IZuylX5oxMgjKKGZ3F4te2zOKe7bqenQoMpdV2okFAK1WsEG4P5wdMkT9lYgw6D mznw== Return-Path: Sender: David Muse Date: Fri, 22 Mar 2019 14:53:00 -0000 From: David Muse To: libc-alpha@sourceware.org Subject: thread heap leak? Message-Id: <20190322105304.1e848cefb45fb65d32a96b23@firstworks.com> Mime-Version: 1.0 Content-Type: text/plain; charset=US-ASCII Content-Transfer-Encoding: 7bit X-SW-Source: 2019-03/txt/msg00470.txt.bz2 Hello all, I've been chasing a series of strange bugs for months now, and I hope someone can shed some light on it. We have this little server program that listens for client connections on an inet socket. When it receives a connection, it pthread_create()'s a detached thread to handle the request and the main thread waits for more client connections. The child thread talks to the client, and usually forks off a new process to handle the client request. It then relays data back and forth between the child process' stdin/stdout and the client. Eventually the process exit()'s, and the thread pthread_exit()'s. This program has run with no problems for like 6 years. We recently started deploying it to Amazon AWS and ran into all kinds of trouble. We originally encountered the fork+malloc error, and the program would crash 4 or 5 times per day. We upgraded to glibc-2.28 to solve that. By upgraded, I mean that we compiled glibc-2.28 and are running the program by manually invoking the dynamic loader like this: /opt/glibc-2.29-runtime/lib/ld-2.28.so --library-path /opt/glibc-2.28-runtime/lib:/lib64:/usr/lib64 ./proxy -port 3490 This solved the fork+malloc error. However, we then started noticing what appeared to be a memory leak. Valgrind counldn't find anything though, and we eventually discovered that anyonymous heap segments were piling up. For example: /proc/943/maps: ... ibc-2.29-runtime/lib/libc-2.29.so 2b933f891000-2b933fa91000 ---p 001e0000 103:01 487930 /opt/glibc-2.29-runtime/lib/libc-2.29.so 2b933fa91000-2b933fa95000 r--p 001e0000 103:01 487930 /opt/glibc-2.29-runtime/lib/libc-2.29.so 2b933fa95000-2b933fa97000 rw-p 001e4000 103:01 487930 /opt/glibc-2.29-runtime/lib/libc-2.29.so ... these "anonymous" segments ... 2b933fa97000-2b933fa9b000 rw-p 00000000 00:00 0 2b933fa9b000-2b933fa9c000 ---p 00000000 00:00 0 2b933fa9c000-2b933fc9c000 rw-p 00000000 00:00 0 2b933fc9c000-2b933fc9d000 ---p 00000000 00:00 0 2b933fc9d000-2b933fe9d000 rw-p 00000000 00:00 0 2b9340000000-2b934007f000 rw-p 00000000 00:00 0 2b934007f000-2b9344000000 ---p 00000000 00:00 0 2b9344000000-2b9344032000 rw-p 00000000 00:00 0 2b9344032000-2b9348000000 ---p 00000000 00:00 0 2b9348000000-2b9348001000 ---p 00000000 00:00 0 2b9348001000-2b9348201000 rw-p 00000000 00:00 0 2b9348201000-2b9348202000 ---p 00000000 00:00 0 2b9348202000-2b9348402000 rw-p 00000000 00:00 0 2b9348402000-2b9348403000 ---p 00000000 00:00 0 2b9348403000-2b9348603000 rw-p 00000000 00:00 0 2b9348603000-2b9348604000 ---p 00000000 00:00 0 2b9348604000-2b9348804000 rw-p 00000000 00:00 0 2b934c000000-2b934c023000 rw-p 00000000 00:00 0 2b934c023000-2b9350000000 ---p 00000000 00:00 0 2b9350000000-2b9350021000 rw-p 00000000 00:00 0 2b9350021000-2b9354000000 ---p 00000000 00:00 0 2b9354000000-2b9354084000 rw-p 00000000 00:00 0 2b9354084000-2b9358000000 ---p 00000000 00:00 0 2b9358000000-2b9358021000 rw-p 00000000 00:00 0 2b9358021000-2b935c000000 ---p 00000000 00:00 0 7fff8287e000-7fff8289f000 rw-p 00000000 00:00 0 [stack] 7fff828f4000-7fff828f7000 r--p 00000000 00:00 0 [vvar] 7fff828f7000-7fff828f9000 r-xp 00000000 00:00 0 [vdso] ffffffffff600000-ffffffffff601000 r-xp 00000000 00:00 0 [vsyscall] After quite a bit of googling, I discovered the name of these segments: "anonymous" segments, and that they are used for various things, including mmapping chunks of files, and thread stacks. Over time, we get more and more of them until top shows the app's VIRT to be around 4G (I think, maybe just 2G). Then it crashes. The RES is never more than a few MB. Upgrading to glibc 2.29 dramatically reduced this behavior, but the heap still grows and every 4 or 5 days the app crashes. Any ideas what might be going on? Thanks! David Muse david.muse@firstworks.com