Considering merge optimizations from Timsort in __move_merge_adaptive, __move_merge_adaptive

public inbox for libc-help@sourceware.org
 help / color / mirror / Atom feed

* Considering merge optimizations from Timsort in __move_merge_adaptive, __move_merge_adaptive_backward
@ 2018-07-03  4:16 jrmarsha
  2018-07-03 20:04 ` Carlos O'Donell
  0 siblings, 1 reply; 7+ messages in thread
From: jrmarsha @ 2018-07-03  4:16 UTC (permalink / raw)
  To: libc-help

Hello all,

I've been implementing a new version of Timsort in C++ using more 
standard facilities.Â  One thing I've come across which adds a divide 
between a larger codebase and faster code is the implementation of 
std::inplace_merge with sufficient memory.Â  This goes to the specific 
functionsÂ  __move_merge_adaptive, and __move_merge_adaptive_backward.Â  
In particular, there is an optimization where each insertion searches 
for the maximum number of elements it can move.Â  A brief in context 
explanation is here: 
https://en.wikipedia.org/wiki/Timsort#Individual_merges .Â  Would such a 
trick have a place in glibc or is the overhead likely too great in the 
general case?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Considering merge optimizations from Timsort in __move_merge_adaptive, __move_merge_adaptive_backward
  2018-07-03  4:16 Considering merge optimizations from Timsort in __move_merge_adaptive, __move_merge_adaptive_backward jrmarsha
@ 2018-07-03 20:04 ` Carlos O'Donell
  2018-07-03 20:10   ` jrmarsha
  0 siblings, 1 reply; 7+ messages in thread
From: Carlos O'Donell @ 2018-07-03 20:04 UTC (permalink / raw)
  To: jrmarsha; +Cc: libc-help

On Tue, Jul 3, 2018 at 12:16 AM, jrmarsha <jrmarsha@mtu.edu> wrote:
> Hello all,
>
> I've been implementing a new version of Timsort in C++ using more standard
> facilities.  One thing I've come across which adds a divide between a larger
> codebase and faster code is the implementation of std::inplace_merge with
> sufficient memory.  This goes to the specific functions
> __move_merge_adaptive, and __move_merge_adaptive_backward.  In particular,
> there is an optimization where each insertion searches for the maximum
> number of elements it can move.  A brief in context explanation is here:
> https://en.wikipedia.org/wiki/Timsort#Individual_merges .  Would such a
> trick have a place in glibc or is the overhead likely too great in the
> general case?

It's possible. We have a tunable framework for changing runtime
parameters like this based on application needs.  The interface only
allows you to make the choice once at startup. So it may not be
sufficient to meet the needs of all users. However some choice in this
area may be better than no choice.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Considering merge optimizations from Timsort in __move_merge_adaptive, __move_merge_adaptive_backward
  2018-07-03 20:04 ` Carlos O'Donell
@ 2018-07-03 20:10   ` jrmarsha
  2018-07-03 23:30     ` Carlos O'Donell
  0 siblings, 1 reply; 7+ messages in thread
From: jrmarsha @ 2018-07-03 20:10 UTC (permalink / raw)
  To: Carlos O'Donell; +Cc: libc-help


>> Hello all,
>>
>> I've been implementing a new version of Timsort in C++ using more standard
>> facilities.  One thing I've come across which adds a divide between a larger
>> codebase and faster code is the implementation of std::inplace_merge with
>> sufficient memory.  This goes to the specific functions
>> __move_merge_adaptive, and __move_merge_adaptive_backward.  In particular,
>> there is an optimization where each insertion searches for the maximum
>> number of elements it can move.  A brief in context explanation is here:
>> https://en.wikipedia.org/wiki/Timsort#Individual_merges .  Would such a
>> trick have a place in glibc or is the overhead likely too great in the
>> general case?
> It's possible. We have a tunable framework for changing runtime
> parameters like this based on application needs.  The interface only
> allows you to make the choice once at startup. So it may not be
> sufficient to meet the needs of all users. However some choice in this
> area may be better than no choice.
>
> Cheers,
> Carlos.
How would I get new code incorporated into this and make that decision 
intelligently at compile time?Â  What would be a good way to add this 
inplace_merge variant in?

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Considering merge optimizations from Timsort in __move_merge_adaptive, __move_merge_adaptive_backward
  2018-07-03 20:10   ` jrmarsha
@ 2018-07-03 23:30     ` Carlos O'Donell
  2018-07-04 12:08       ` Adhemerval Zanella
  2018-07-20  3:48       ` Joshua Marshall
  0 siblings, 2 replies; 7+ messages in thread
From: Carlos O'Donell @ 2018-07-03 23:30 UTC (permalink / raw)
  To: jrmarsha; +Cc: libc-help

On Tue, Jul 3, 2018 at 4:10 PM, jrmarsha <jrmarsha@mtu.edu> wrote:
>
>>> Hello all,
>>>
>>> I've been implementing a new version of Timsort in C++ using more
>>> standard
>>> facilities.  One thing I've come across which adds a divide between a
>>> larger
>>> codebase and faster code is the implementation of std::inplace_merge with
>>> sufficient memory.  This goes to the specific functions
>>> __move_merge_adaptive, and __move_merge_adaptive_backward.  In
>>> particular,
>>> there is an optimization where each insertion searches for the maximum
>>> number of elements it can move.  A brief in context explanation is here:
>>> https://en.wikipedia.org/wiki/Timsort#Individual_merges .  Would such a
>>> trick have a place in glibc or is the overhead likely too great in the
>>> general case?
>>
>> It's possible. We have a tunable framework for changing runtime
>> parameters like this based on application needs.  The interface only
>> allows you to make the choice once at startup. So it may not be
>> sufficient to meet the needs of all users. However some choice in this
>> area may be better than no choice.
>>
>> Cheers,
>> Carlos.
>
> How would I get new code incorporated into this and make that decision
> intelligently at compile time?  What would be a good way to add this
> inplace_merge variant in?

If you are looking specifically at std::inplace_merge, then you need to talk
to the GCC developers, since libstdc++ is part of GCC.
Please see: https://gcc.gnu.org/

If you are interested in similar strategies for the C library, then you can look
at the qsort() API, which can be implemented in any way really as long as
it meets the API requirements (and is written in C). In glibc you might use a
tunable to switch between one algorithm and another. You can see this
implementation in stdlib/qsort.c in glibc, which does implement quicksort.

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Considering merge optimizations from Timsort in __move_merge_adaptive, __move_merge_adaptive_backward
  2018-07-03 23:30     ` Carlos O'Donell
@ 2018-07-04 12:08       ` Adhemerval Zanella
  2018-07-20  3:48       ` Joshua Marshall
  1 sibling, 0 replies; 7+ messages in thread
From: Adhemerval Zanella @ 2018-07-04 12:08 UTC (permalink / raw)
  To: libc-help

On 03/07/2018 20:30, Carlos O'Donell wrote:
> On Tue, Jul 3, 2018 at 4:10 PM, jrmarsha <jrmarsha@mtu.edu> wrote:
>>
>>>> Hello all,
>>>>
>>>> I've been implementing a new version of Timsort in C++ using more
>>>> standard
>>>> facilities.  One thing I've come across which adds a divide between a
>>>> larger
>>>> codebase and faster code is the implementation of std::inplace_merge with
>>>> sufficient memory.  This goes to the specific functions
>>>> __move_merge_adaptive, and __move_merge_adaptive_backward.  In
>>>> particular,
>>>> there is an optimization where each insertion searches for the maximum
>>>> number of elements it can move.  A brief in context explanation is here:
>>>> https://en.wikipedia.org/wiki/Timsort#Individual_merges .  Would such a
>>>> trick have a place in glibc or is the overhead likely too great in the
>>>> general case?
>>>
>>> It's possible. We have a tunable framework for changing runtime
>>> parameters like this based on application needs.  The interface only
>>> allows you to make the choice once at startup. So it may not be
>>> sufficient to meet the needs of all users. However some choice in this
>>> area may be better than no choice.
>>>
>>> Cheers,
>>> Carlos.
>>
>> How would I get new code incorporated into this and make that decision
>> intelligently at compile time?  What would be a good way to add this
>> inplace_merge variant in?
> 
> If you are looking specifically at std::inplace_merge, then you need to talk
> to the GCC developers, since libstdc++ is part of GCC.
> Please see: https://gcc.gnu.org/
> 
> If you are interested in similar strategies for the C library, then you can look
> at the qsort() API, which can be implemented in any way really as long as
> it meets the API requirements (and is written in C). In glibc you might use a
> tunable to switch between one algorithm and another. You can see this
> implementation in stdlib/qsort.c in glibc, which does implement quicksort.

In fact I think we should use a O(1) space algorithm for qsort() instead
of current strategy of quicksort plus mergesort.  This allows qsort() to
be fully AS-Safe and AC-Safe.  

I would also avoid adding a tunable for qsort, if the idea is to provide 
different algorithm I would rather use FreeBSD strategy to provide explicit
implementations [1].  I think it is not a good expectation to set the
algorithms expected complexity, in both runtime and memory consumption, 
by an external environment (for instance user might use qsort in a signal
handler and it might start to fail due malloc usage if the tunable is
set).

I have sent patchset upstream for review to refactor qsort [2]. It changed
to use the internal quicksort implementation as default (since our 
implementation has the optimization to use O(1) space) plus some small
optimization to the generic case (which also seems to most used in my
analysis).  Ondrej has raised some concerns about setting qsort to use
quicksort as default which has O(n^2) in worst-case performance as a 
potential issue and I replied that we might try to get quicksort to get 
O(n*ln(n)) statistically as a QoI. I my experiments I used the Mersenner 
Twister implementation I used for benchmark as the source of entropy for 
pivot selection, but we might also try to use the new arc4random instead 
(although Florian has stated it might require more work to use internally).

Another option, which I considering for next submission for 2.29, is to
just smoothsort algorithm as an alternative implementation.  It has also
O(1) space utilization, however it is faster only for already sorted input 
being slower for random, mostly sorted or repeated inputs. For reference
I have pushed the implementation I measured against on personal branch [4].

Or we can just use a simple heapsort, as Linux kernel does internally,
which is slower than both heapsort and quicksort, but the code is a lot
simpler than both.

There is also the block sort, which have O(n*ln(n)) worse-case complexity
and O(1) utilization, but I have not evaluated it. Its description seems 
rather complex, so I am not sure if it worth the trouble.

[1] https://www.freebsd.org/cgi/man.cgi?query=mergesort&sektion=3
[2] https://sourceware.org/ml/libc-alpha/2018-01/msg00626.html
[3] https://sourceware.org/ml/libc-alpha/2018-02/msg00358.html
[4] https://sourceware.org/git/gitweb.cgi?p=glibc.git;a=shortlog;h=refs/heads/azanella/qsort-smooth

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Considering merge optimizations from Timsort in __move_merge_adaptive, __move_merge_adaptive_backward
  2018-07-03 23:30     ` Carlos O'Donell
  2018-07-04 12:08       ` Adhemerval Zanella
@ 2018-07-20  3:48       ` Joshua Marshall
  2018-07-23 20:57         ` Carlos O'Donell
  1 sibling, 1 reply; 7+ messages in thread
From: Joshua Marshall @ 2018-07-20  3:48 UTC (permalink / raw)
  To: Carlos O'Donell, adhemerval.zanella; +Cc: libc-help

Humor me here, it looks like I was somehow dropped from the chain and I'm
not sure how to get it back on track.

It looks like you're focusing on the preferred sort implementation while I
was more focusing on if Timsort had a better general case approach to
merging runs of numbers.  It does a kind of exponential probing thing which
_should_ be about as fast for switching between merging sides often and
provide substantial benefits to long chains to be merged.

As for the actual implementations of std::sort, and std::stable_sort, isn't
introspective sort used and not quick sort + merge sort?  In any case, I'm
more interested in the std::stable_sort side of things and am nearly
finished with a Timsort implementation with is parallelizable but doesn't
outperform the current std::stable_sort I think in large part because of
the inplace merging.  For that matter, I can't seem to track down what the
actual sort type used for std::stable_sort is.



On Tue, Jul 3, 2018 at 7:30 PM, Carlos O'Donell <carlos@systemhalted.org>
wrote:

> On Tue, Jul 3, 2018 at 4:10 PM, jrmarsha <jrmarsha@mtu.edu> wrote:
> >
> >>> Hello all,
> >>>
> >>> I've been implementing a new version of Timsort in C++ using more
> >>> standard
> >>> facilities.  One thing I've come across which adds a divide between a
> >>> larger
> >>> codebase and faster code is the implementation of std::inplace_merge
> with
> >>> sufficient memory.  This goes to the specific functions
> >>> __move_merge_adaptive, and __move_merge_adaptive_backward.  In
> >>> particular,
> >>> there is an optimization where each insertion searches for the maximum
> >>> number of elements it can move.  A brief in context explanation is
> here:
> >>> https://en.wikipedia.org/wiki/Timsort#Individual_merges .  Would such
> a
> >>> trick have a place in glibc or is the overhead likely too great in the
> >>> general case?
> >>
> >> It's possible. We have a tunable framework for changing runtime
> >> parameters like this based on application needs.  The interface only
> >> allows you to make the choice once at startup. So it may not be
> >> sufficient to meet the needs of all users. However some choice in this
> >> area may be better than no choice.
> >>
> >> Cheers,
> >> Carlos.
> >
> > How would I get new code incorporated into this and make that decision
> > intelligently at compile time?  What would be a good way to add this
> > inplace_merge variant in?
>
> If you are looking specifically at std::inplace_merge, then you need to
> talk
> to the GCC developers, since libstdc++ is part of GCC.
> Please see: https://gcc.gnu.org/
>
> If you are interested in similar strategies for the C library, then you
> can look
> at the qsort() API, which can be implemented in any way really as long as
> it meets the API requirements (and is written in C). In glibc you might
> use a
> tunable to switch between one algorithm and another. You can see this
> implementation in stdlib/qsort.c in glibc, which does implement quicksort.
>
> Cheers,
> Carlos.
>

^ permalink raw reply	[flat|nested] 7+ messages in thread

* Re: Considering merge optimizations from Timsort in __move_merge_adaptive, __move_merge_adaptive_backward
  2018-07-20  3:48       ` Joshua Marshall
@ 2018-07-23 20:57         ` Carlos O'Donell
  0 siblings, 0 replies; 7+ messages in thread
From: Carlos O'Donell @ 2018-07-23 20:57 UTC (permalink / raw)
  To: Joshua Marshall, Carlos O'Donell, adhemerval.zanella; +Cc: libc-help

On 07/19/2018 11:48 PM, Joshua Marshall wrote:
> As for the actual implementations of std::sort, and std::stable_sort, isn't
> introspective sort used and not quick sort + merge sort?  In any case, I'm
> more interested in the std::stable_sort side of things and am nearly
> finished with a Timsort implementation with is parallelizable but doesn't
> outperform the current std::stable_sort I think in large part because of
> the inplace merging.  For that matter, I can't seem to track down what the
> actual sort type used for std::stable_sort is.

If you are interested in std::stable_sort you must discuss this with the
C++ maintainers for the implementation you're using. If it's a GNU Toolchain
system then it's likely gcc (libstdc++).

You can reach out to them on gcc-help (gcc-help@gcc.gnu.org).

Cheers,
Carlos.

^ permalink raw reply	[flat|nested] 7+ messages in thread

end of thread, other threads:[~2018-07-23 20:57 UTC | newest]

Thread overview: 7+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2018-07-03  4:16 Considering merge optimizations from Timsort in __move_merge_adaptive, __move_merge_adaptive_backward jrmarsha
2018-07-03 20:04 ` Carlos O'Donell
2018-07-03 20:10   ` jrmarsha
2018-07-03 23:30     ` Carlos O'Donell
2018-07-04 12:08       ` Adhemerval Zanella
2018-07-20  3:48       ` Joshua Marshall
2018-07-23 20:57         ` Carlos O'Donell

This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).