From mboxrd@z Thu Jan 1 00:00:00 1970 Return-Path: Received: (qmail 28255 invoked by alias); 4 Jul 2018 12:08:32 -0000 Mailing-List: contact libc-help-help@sourceware.org; run by ezmlm Precedence: bulk List-Subscribe: List-Post: List-Help: , Sender: libc-help-owner@sourceware.org Received: (qmail 28244 invoked by uid 89); 4 Jul 2018 12:08:31 -0000 Authentication-Results: sourceware.org; auth=none X-Spam-SWARE-Status: No, score=-2.9 required=5.0 tests=AWL,BAYES_00,KAM_SHORT,RCVD_IN_DNSWL_NONE,SPF_PASS autolearn=ham version=3.3.2 spammy=HX-Received:sk:y11-v6m, experiments, utilization, Twister X-HELO: mail-qt0-f174.google.com Received: from mail-qt0-f174.google.com (HELO mail-qt0-f174.google.com) (209.85.216.174) by sourceware.org (qpsmtpd/0.93/v0.84-503-g423c35a) with ESMTP; Wed, 04 Jul 2018 12:08:29 +0000 Received: by mail-qt0-f174.google.com with SMTP id q12-v6so4308162qtp.6 for ; Wed, 04 Jul 2018 05:08:29 -0700 (PDT) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=linaro.org; s=google; h=subject:to:references:from:openpgp:autocrypt:message-id:date :user-agent:mime-version:in-reply-to:content-language :content-transfer-encoding; bh=nUoS+ZhuXS0b+UVV8Y6w6KJO8ffsmYJpaTeE3uY5TdU=; b=BLIGW2wJ6JBidyCvgX3PY0TGrNxlknjxmkS1pIckXs1YdsOdVNq7gSE7FlleVGag54 qFxotTk8jUhiyWzQZC2+swZU1tUOUbYVwRXvJVkoKbf+n+QtTlDSIjBcF5h/TiWp6U6H pbBRUsob3yrW+dO9sthxtd+6X/sH0q0ODmOPg= Return-Path: Received: from [10.0.0.106] ([179.159.11.160]) by smtp.googlemail.com with ESMTPSA id p63-v6sm2243031qkb.46.2018.07.04.05.08.26 for (version=TLS1_2 cipher=ECDHE-RSA-AES128-GCM-SHA256 bits=128/128); Wed, 04 Jul 2018 05:08:27 -0700 (PDT) Subject: Re: Considering merge optimizations from Timsort in __move_merge_adaptive, __move_merge_adaptive_backward To: libc-help@sourceware.org References: <3c1e0790-9e50-f27b-b78d-b9149517b3e6@mtu.edu> From: Adhemerval Zanella Openpgp: preference=signencrypt Message-ID: <59fda32d-c0de-6365-8725-9c0e061aaade@linaro.org> Date: Wed, 04 Jul 2018 12:08:00 -0000 User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:52.0) Gecko/20100101 Thunderbird/52.8.0 MIME-Version: 1.0 In-Reply-To: Content-Type: text/plain; charset=utf-8 Content-Transfer-Encoding: 8bit X-IsSubscribed: yes X-SW-Source: 2018-07/txt/msg00008.txt.bz2 On 03/07/2018 20:30, Carlos O'Donell wrote: > On Tue, Jul 3, 2018 at 4:10 PM, jrmarsha wrote: >> >>>> Hello all, >>>> >>>> I've been implementing a new version of Timsort in C++ using more >>>> standard >>>> facilities. One thing I've come across which adds a divide between a >>>> larger >>>> codebase and faster code is the implementation of std::inplace_merge with >>>> sufficient memory. This goes to the specific functions >>>> __move_merge_adaptive, and __move_merge_adaptive_backward. In >>>> particular, >>>> there is an optimization where each insertion searches for the maximum >>>> number of elements it can move. A brief in context explanation is here: >>>> https://en.wikipedia.org/wiki/Timsort#Individual_merges . Would such a >>>> trick have a place in glibc or is the overhead likely too great in the >>>> general case? >>> >>> It's possible. We have a tunable framework for changing runtime >>> parameters like this based on application needs. The interface only >>> allows you to make the choice once at startup. So it may not be >>> sufficient to meet the needs of all users. However some choice in this >>> area may be better than no choice. >>> >>> Cheers, >>> Carlos. >> >> How would I get new code incorporated into this and make that decision >> intelligently at compile time? What would be a good way to add this >> inplace_merge variant in? > > If you are looking specifically at std::inplace_merge, then you need to talk > to the GCC developers, since libstdc++ is part of GCC. > Please see: https://gcc.gnu.org/ > > If you are interested in similar strategies for the C library, then you can look > at the qsort() API, which can be implemented in any way really as long as > it meets the API requirements (and is written in C). In glibc you might use a > tunable to switch between one algorithm and another. You can see this > implementation in stdlib/qsort.c in glibc, which does implement quicksort. In fact I think we should use a O(1) space algorithm for qsort() instead of current strategy of quicksort plus mergesort. This allows qsort() to be fully AS-Safe and AC-Safe. I would also avoid adding a tunable for qsort, if the idea is to provide different algorithm I would rather use FreeBSD strategy to provide explicit implementations [1]. I think it is not a good expectation to set the algorithms expected complexity, in both runtime and memory consumption, by an external environment (for instance user might use qsort in a signal handler and it might start to fail due malloc usage if the tunable is set). I have sent patchset upstream for review to refactor qsort [2]. It changed to use the internal quicksort implementation as default (since our implementation has the optimization to use O(1) space) plus some small optimization to the generic case (which also seems to most used in my analysis). Ondrej has raised some concerns about setting qsort to use quicksort as default which has O(n^2) in worst-case performance as a potential issue and I replied that we might try to get quicksort to get O(n*ln(n)) statistically as a QoI. I my experiments I used the Mersenner Twister implementation I used for benchmark as the source of entropy for pivot selection, but we might also try to use the new arc4random instead (although Florian has stated it might require more work to use internally). Another option, which I considering for next submission for 2.29, is to just smoothsort algorithm as an alternative implementation. It has also O(1) space utilization, however it is faster only for already sorted input being slower for random, mostly sorted or repeated inputs. For reference I have pushed the implementation I measured against on personal branch [4]. Or we can just use a simple heapsort, as Linux kernel does internally, which is slower than both heapsort and quicksort, but the code is a lot simpler than both. There is also the block sort, which have O(n*ln(n)) worse-case complexity and O(1) utilization, but I have not evaluated it. Its description seems rather complex, so I am not sure if it worth the trouble. [1] https://www.freebsd.org/cgi/man.cgi?query=mergesort&sektion=3 [2] https://sourceware.org/ml/libc-alpha/2018-01/msg00626.html [3] https://sourceware.org/ml/libc-alpha/2018-02/msg00358.html [4] https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=shortlog;h=refs/heads/azanella/qsort-smooth