From mboxrd@z Thu Jan  1 00:00:00 1970
Return-Path: <mark@maxrnd.com>
Received: from m0.truegem.net (m0.truegem.net [69.55.228.47])
 by sourceware.org (Postfix) with ESMTPS id 1287E385800A
 for <cygwin-apps@cygwin.com>; Mon, 18 Jan 2021 07:07:32 +0000 (GMT)
DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org 1287E385800A
Authentication-Results: sourceware.org;
 dmarc=none (p=none dis=none) header.from=maxrnd.com
Authentication-Results: sourceware.org; spf=none smtp.mailfrom=mark@maxrnd.com
Received: (from daemon@localhost)
 by m0.truegem.net (8.12.11/8.12.11) id 10I77Vgt096862
 for <cygwin-apps@cygwin.com>; Sun, 17 Jan 2021 23:07:31 -0800 (PST)
 (envelope-from mark@maxrnd.com)
Received: from 162-235-43-67.lightspeed.irvnca.sbcglobal.net(162.235.43.67),
 claiming to be "[192.168.1.20]"
 via SMTP by m0.truegem.net, id smtpdJ9EGIG; Sun Jan 17 23:07:24 2021
Subject: Re: Extreme slowdown due to malloc?
To: Cygwin-Apps <cygwin-apps@cygwin.com>
References: <87mty66fw5.fsf@Rainer.invalid>
 <012a9e3c-ec24-f307-a3c4-9f2589d54e34@maxrnd.com>
 <87k0tae4cm.fsf@Otto.invalid> <87eej3beys.fsf@Rainer.invalid>
From: Mark Geisert <mark@maxrnd.com>
Message-ID: <cf2ade9f-a2fa-ed59-f754-9f558b66ef2d@maxrnd.com>
Date: Sun, 17 Jan 2021 23:07:24 -0800
User-Agent: Mozilla/5.0 (Windows NT 10.0; WOW64; rv:52.0) Gecko/20100101
 Firefox/52.0 SeaMonkey/2.49.4
MIME-Version: 1.0
In-Reply-To: <87eej3beys.fsf@Rainer.invalid>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Spam-Status: No, score=-4.8 required=5.0 tests=BAYES_00, KAM_DMARC_STATUS,
 KAM_LAZY_DOMAIN_SECURITY, NICE_REPLY_A, SPF_HELO_NONE, SPF_NONE,
 TXREP autolearn=no autolearn_force=no version=3.4.2
X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on
 server2.sourceware.org
X-BeenThere: cygwin-apps@cygwin.com
X-Mailman-Version: 2.1.29
Precedence: list
List-Id: Cygwin package maintainer discussion list <cygwin-apps.cygwin.com>
List-Unsubscribe: <https://cygwin.com/mailman/options/cygwin-apps>,
 <mailto:cygwin-apps-request@cygwin.com?subject=unsubscribe>
List-Archive: <https://cygwin.com/pipermail/cygwin-apps/>
List-Post: <mailto:cygwin-apps@cygwin.com>
List-Help: <mailto:cygwin-apps-request@cygwin.com?subject=help>
List-Subscribe: <https://cygwin.com/mailman/listinfo/cygwin-apps>,
 <mailto:cygwin-apps-request@cygwin.com?subject=subscribe>
X-List-Received-Date: Mon, 18 Jan 2021 07:07:33 -0000

Hi Achim,
Thank you very much for the detailed instructions and also the comparison data 
Linux vs Cygwin for all those testcases.

Achim Gratz wrote:
> ASSI writes:
>>> I have a Cygwin malloc speedup patch that *might* help the m-t part.
>>> I'll prepare and submit that to cygwin-patches shortly.
>>
>> Well, if you want to test it with the new ZStandard, give it a spin…
>> I'll check how far I can strip that test down so you can use the Cygwin
>> source tree for testing.

I've now done this.  And I don't see any improvement.  Reasons below...

> OK, it's actually pretty simple, do this inside a checkout of
> newlib-cygwin:
> 
> $ find newlib winsup texinfo -type f > flist
> $ zstd --train-cover --ultra -22 -T0 -vv --filelist=flist -o dict-cover
> 
> On Linux, it reads in all the files in about two seconds, while it takes
> quite a while longer on Cygwin.  But the real bummer is that
> constructing the partial suffix arrays (which is single-threaded) will
> seemingly take forever, while it's done much faster on Linux.  You can
> pare down the number of files like that:
> 
> $ shuf -n 320 flist > slist

I've settled on '-n 1600' for testing.  I'm running these Cygwin tests on a 2C/4T 
i3-something with 8GB memory and an SSD used for filesystem and page file.  Not a 
dog but clearly not a dire-wolf either.

The page fault numbers are comparable to what you've shown for Cygwin on your 
system.  The long pause after zstd prints "Constructing partial suffix array" is 
because zstd is cpu-bound in qsort() for a long time.  No paging during that time. 
  Then when the statistics start being printed out, that's when the paging 
insanity starts.

What I discovered is that zstd is repeatedly asking malloc() for large memory 
blocks, presumably to mmap files in, then free()ing them.  Any malloc request 256K 
or larger is fulfilled by mmap() rather than enlarging the heap for it.  But 
crucially, there is no mechanism for our malloc to hang on to freed mmap()ed pages 
for future use.  If you free an mmap()ed block, it is unmap()ed immediately.  So 
for zstd's usage pattern you get an incredible number of page faults to satisfy 
the mmap()s and Windows seems to take a non-trivial bit of time for each mmap().

I will be looking at our malloc implementation to see if tuning something can fix 
this behavior.  Adding code is the last resort.
Thanks again for the great testcase.

..mark