From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: from smtpout2.vodafonemail.de (smtpout2.vodafonemail.de [145.253.239.133]) by sourceware.org (Postfix) with ESMTPS id DF5E6384A023 for ; Tue, 22 Dec 2020 06:34:50 +0000 (GMT) DMARC-Filter: OpenDMARC Filter v1.3.2 sourceware.org DF5E6384A023 Authentication-Results: sourceware.org; dmarc=none (p=none dis=none) header.from=nexgo.de Authentication-Results: sourceware.org; spf=pass smtp.mailfrom=Stromeko@nexgo.de Received: from smtp.vodafone.de (unknown [10.2.0.34]) by smtpout2.vodafonemail.de (Postfix) with ESMTP id 4B08E123851 for ; Tue, 22 Dec 2020 07:34:49 +0100 (CET) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=nexgo.de; s=vfde-smtpout-mb-15sep; t=1608618889; bh=nESej+ckKgDKDLkzIH8Ik0cz6V0En1ftnmK6x7VFWYc=; h=From:To:Subject:References:Date:In-Reply-To; b=A+/5Jq5DUzocv1Z5jGv6oadGUfUTt7CjSF3zPrkv2h1C9+RAS13zEPhzNcDLFU6Yy Iy9dMw+3EHsM5ZlicB2JegApvKDmmDDV8YKCqnLH3TIfALLouUTDE9uCGngj/t08N4 0iOcqOOT65AJ2YACnjv4LDLY/YURMfGkKLVc3t3o= Received: from Otto (p54a0ca05.dip0.t-ipconnect.de [84.160.202.5]) (using TLSv1.3 with cipher TLS_AES_256_GCM_SHA384 (256/256 bits) server-signature RSA-PSS (2048 bits)) (No client certificate requested) by smtp.vodafone.de (Postfix) with ESMTPSA id AA09414252B for ; Tue, 22 Dec 2020 06:34:48 +0000 (UTC) From: ASSI To: cygwin-apps@cygwin.com Subject: Re: Extreme slowdown due to malloc? References: <87mty66fw5.fsf@Rainer.invalid> <012a9e3c-ec24-f307-a3c4-9f2589d54e34@maxrnd.com> Date: Tue, 22 Dec 2020 07:34:33 +0100 In-Reply-To: <012a9e3c-ec24-f307-a3c4-9f2589d54e34@maxrnd.com> (Mark Geisert's message of "Mon, 21 Dec 2020 20:37:14 -0800") Message-ID: <87k0tae4cm.fsf@Otto.invalid> User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/27.1 (gnu/linux) MIME-Version: 1.0 Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: quoted-printable X-purgate-type: clean X-purgate-Ad: Categorized by eleven eXpurgate (R) http://www.eleven.de X-purgate: This mail is considered clean (visit http://www.eleven.de for further information) X-purgate: clean X-purgate-size: 2387 X-purgate-ID: 155817::1608618888-000006AA-79344E03/0/0 X-Spam-Status: No, score=-2.9 required=5.0 tests=BAYES_00, DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, DKIM_VALID_EF, RCVD_IN_DNSWL_LOW, RCVD_IN_MSPIKE_H2, SPF_HELO_NONE, SPF_PASS, TXREP autolearn=ham autolearn_force=no version=3.4.2 X-Spam-Checker-Version: SpamAssassin 3.4.2 (2018-09-13) on server2.sourceware.org X-BeenThere: cygwin-apps@cygwin.com X-Mailman-Version: 2.1.29 Precedence: list List-Id: Cygwin package maintainer discussion list List-Unsubscribe: , List-Archive: List-Post: List-Help: List-Subscribe: , X-List-Received-Date: Tue, 22 Dec 2020 06:34:53 -0000 Mark Geisert writes: >> I've been experimenting a bit with ZStandard dictionaries. The >> dictionary builder is probably not the most optimized piece of software > > Is this what leads you to suspect malloc? Really heavy use of malloc? That piece of code does a lot of smallish allocations when it builds up the suffix array. Also that is where I suspect the high number of page faults come from as I have no other explanation. >> The obvious difference is that I/O takes a lot longer on Cygwin >> (roughly >> a minute for reading all the data) and that I have an insane amount of >> page faults on Windows (as reported by time) vs. none on Linux. > > How much RAM does the Windows machine have? Do you have a paging > file? Is it fixed size or "let Windows manage"? How big is it? This machine is fully expanded to 32GiB, it won't run out of physical memory for any of the tests shown. The full dictionary build that I'd wanted to have run on this machine would come in as something like 100000 in that table and takes around 6 hours consuming 15GiB on my Linux box (4T), so if I could run that I might see a bit of memory pressure. The pagefile set up as fixed (I think 64GiB), but is unused. >> While doing that I also noticed that top shows the program taking 100% >> CPU in the multithreaded portion of the program, while it should show >> close to 800% at that time. I'm not sure if that information just isn't >> available on Windows or if procps-ng needs to look someplace else for >> that to be shown as expected. > > No offense, but are you sure it's actually running multi-threaded on Wind= ows? Yes, once it finally comes to the statistics part of the dictionary build it'll use all eight threads no problem. That's why it doesn't take twice as long (wall time) at the 400 mark compared to the 200. > I have a Cygwin malloc speedup patch that *might* help the m-t part. > I'll prepare and submit that to cygwin-patches shortly. Well, if you want to test it with the new ZStandard, give it a spin=E2=80=A6 I'll check how far I can strip that test down so you can use the Cygwin source tree for testing. Regards, Achim. --=20 +<[Q+ Matrix-12 WAVE#46+305 Neuron microQkb Andromeda XTk Blofeld]>+ Waldorf MIDI Implementation & additional documentation: http://Synth.Stromeko.net/Downloads.html#WaldorfDocs