From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from m0.truegem.net (m0.truegem.net [69.55.228.47]) by sourceware.org (Postfix) with ESMTPS id 3B5A538708F7 for ; Tue, 22 Dec 2020 04:37:19 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 3B5A538708F7 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=maxrnd.com Authentication-Results: sourceware.org; spf=none smtp.mailfrom=mark@maxrnd.com Received: (from daemon@localhost) by m0.truegem.net (8.12.11/8.12.11) id 0BM4bHT1085918 for ; Mon, 21 Dec 2020 20:37:17 -0800 (PST) (envelope-from mark@maxrnd.com) Received: from 162-235-43-67.lightspeed.irvnca.sbcglobal.net(162.235.43.67), claiming to be "[192.168.1.100]" via SMTP by m0.truegem.net, id smtpdRNnr2r; Mon Dec 21 20:37:14 2020 Subject: Re: Extreme slowdown due to malloc? To: Cygwin-Apps References: <87mty66fw5.fsf@Rainer.invalid> From: Mark Geisert Message-ID: <012a9e3c-ec24-f307-a3c4-9f2589d54e34@maxrnd.com> Date: Mon, 21 Dec 2020 20:37:14 -0800 User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101 Firefox/52.0 SeaMonkey/2.49.4 MIME-Version: 1.0 In-Reply-To: <87mty66fw5.fsf@Rainer.invalid> Content-Type: text/plain; charset=UTF-8; format=flowed Content-Transfer-Encoding: 7bit X-Spam-Status: No, score=-5.3 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS, KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, SPF_HELO_NONE, SPF_NONE, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin-apps@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Cygwin package maintainer discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Dec 2020 04:37:21 -0000 Hi Achim, Achim Gratz wrote: > I've been experimenting a bit with ZStandard dictionaries. The > dictionary builder is probably not the most optimized piece of software Is this what leads you to suspect malloc? Really heavy use of malloc? > and if you feed it large amounts of data it needs quite a lot of > cycles. So I thought I run some of this on Cygwin since that machine is > faster and has more threads than my Linux box. Unfortunately that plan > shattered due to extreme slowness of the first (single-threaded) part of > the dictionary builder that sets up the partial suffix array. > > |------+---------------+---------------| > | | E3-1225v3 | E3-1276v3 | > | | 4C/4T | 4C/8T | > | | 3.2/3.6GHz | 3.6/4.0GHz | > |------+---------------+---------------| > | 100 | 00:14 / 55s | 00:23 / 126s | > | 200 | 00:39 / 145s | 01:10 / 241s | > | 400 | 01:12 / 266s | 01:25 / 322s | > | 800 | 02:06 / 466s | 11:12 / 1245s | > | 1600 | 03:57 / 872s | > 2hr | > | 3200 | 08:03 / 1756s | n/a | > | 6400 | 16:17 / 3581s | n/a | > |------+---------------+---------------| > > The obvious difference is that I/O takes a lot longer on Cygwin (roughly > a minute for reading all the data) and that I have an insane amount of > page faults on Windows (as reported by time) vs. none on Linux. How much RAM does the Windows machine have? Do you have a paging file? Is it fixed size or "let Windows manage"? How big is it? > While doing that I also noticed that top shows the program taking 100% > CPU in the multithreaded portion of the program, while it should show > close to 800% at that time. I'm not sure if that information just isn't > available on Windows or if procps-ng needs to look someplace else for > that to be shown as expected. No offense, but are you sure it's actually running multi-threaded on Windows? I have a Cygwin malloc speedup patch that *might* help the m-t part. I'll prepare and submit that to cygwin-patches shortly. Cheers, ..mark