public inbox for libc-alpha@sourceware.org
 help / color / mirror / Atom feed
From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: Florian Weimer <fweimer@redhat.com>, libc-alpha@sourceware.org
Subject: Re: Review of the qsort situation
Date: Mon, 4 Dec 2023 11:32:26 -0300	[thread overview]
Message-ID: <badb4868-9359-4090-9893-3a6564a7b953@linaro.org> (raw)
In-Reply-To: <87fs0ixghf.fsf@oldenburg.str.redhat.com>



On 04/12/23 10:08, Florian Weimer wrote:
> Last week, I promised to look at the qsort situation.  I see two major
> issues.
> 
> (a) Compatibility impact of the new implementation
> 
> We saw two different kinds of breakage in LLVM, a hang in 389-ds-base
> (the LDAP server), and a test suite failure in notmuch.  I'm pretty sure
> this is just the surface, and there is much breakage we don't know about
> yet.  Particularly in proprietary software.
> 
> So we'd probably have to retain the old implementation and add a symbol
> version.  Not great, but manageable.

Are still compatibility issues besides the sort callback being called
with same pointer (which should be fixed yet)?

I tried to model the introsort based on libstdc++ std::sort, so it is
kinda surprising that we are seeing extra compatibilities issues.

In any case, this still a latent issues that was not triggered before
because the quicksort was not taken in most cases.  On a memory pressure
situation, the quicksort will be used and we caller will face the same
compatibility issues.

> 
> (b) Performance of the new information
> 
> I did not know this previously, but mergesort is rather close to the
> theoretical minimum in terms of comparisons.  Quicksort uses something
> 39% more comparisons on average.
> 
> I believe minimizing the number of comparisons is a reasonable goal
> given how qsort is typically used: the indirect call has a certain
> overhead, and sorting variable-length strings is not uncommon, either.
> 
> As an example, I looked at the stringtable sorting in ldconfig.  On my
> system, there are 4,049 strings to sort for tail merging.  Because it's
> targeting tail merging, this sort does not use strcmp, but a reverse
> string comparison.  (It would probably be faster to reverse the strings,
> sort, and reverse the strings again.)
> 
> The number of comparisons and the median run-times (for the fixed input
> table) are
> 
>   master:                         57,384   0.657ms
>   master with qsort from 2.38:    43,426   0.543ms
>   master heapsort only:           84,507   0.921ms
>     + Kuan-Wei Chiu patch:        49,917   0.858ms
> 
> The numbers fluctuate quite a bit, I assume the branch predictor is
> struggling.  They contradict somewhat that minimizing comparisons is
> important.
> 
> I can try to gather more performance numbers, but I don't think this
> looks good.  It's slower, and there are far-ranging compatibility
> issues.  So I don't think this is the right direction.  If quicksort was
> actually faster than mergesort, it might be a different discussion.

Yes, the quicksort performance is usually worse than mergesort and I did
advertise it on cover letter.  The change was more motivated to remove
the malloc usage, so the function is fully AS-signal-safe and to simplify
corner case where memory pressure might trigger quicksort and its O(n^2)
worse case.

So we can add back a compatibility symbol or even just revert this change,
but it does not fully solve the corner case issue on what to do if malloc
fails. Should we abort, like gcc_sort does on gcc? Should we fallback to
heapsort (it might the best option in this case)?

POSIX does not really define qsort to be sync-signal-safe, although all
other implementations I am aware of makes it (usually by using quicksort
with O(1) space requirement).

  reply	other threads:[~2023-12-04 14:32 UTC|newest]

Thread overview: 4+ messages / expand[flat|nested]  mbox.gz  Atom feed  top
2023-12-04 13:08 Florian Weimer
2023-12-04 14:32 ` Adhemerval Zanella Netto [this message]
2023-12-05 10:40   ` Florian Weimer
2023-12-05 11:43 ` Sam James

Reply instructions:

You may reply publicly to this message via plain-text email
using any one of the following methods:

* Save the following mbox file, import it into your mail client,
  and reply-to-all from there: mbox

  Avoid top-posting and favor interleaved quoting:
  https://en.wikipedia.org/wiki/Posting_style#Interleaved_style

* Reply using the --to, --cc, and --in-reply-to
  switches of git-send-email(1):

  git send-email \
    --in-reply-to=badb4868-9359-4090-9893-3a6564a7b953@linaro.org \
    --to=adhemerval.zanella@linaro.org \
    --cc=fweimer@redhat.com \
    --cc=libc-alpha@sourceware.org \
    /path/to/YOUR_REPLY

  https://kernel.org/pub/software/scm/git/docs/git-send-email.html

* If your mail client supports setting the In-Reply-To header
  via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).