From: Adhemerval Zanella Netto <adhemerval.zanella@linaro.org>
To: Florian Weimer <fweimer@redhat.com>, libc-alpha@sourceware.org
Subject: Re: Review of the qsort situation
Date: Mon, 4 Dec 2023 11:32:26 -0300 [thread overview]
Message-ID: <badb4868-9359-4090-9893-3a6564a7b953@linaro.org> (raw)
In-Reply-To: <87fs0ixghf.fsf@oldenburg.str.redhat.com>
On 04/12/23 10:08, Florian Weimer wrote:
> Last week, I promised to look at the qsort situation. I see two major
> issues.
>
> (a) Compatibility impact of the new implementation
>
> We saw two different kinds of breakage in LLVM, a hang in 389-ds-base
> (the LDAP server), and a test suite failure in notmuch. I'm pretty sure
> this is just the surface, and there is much breakage we don't know about
> yet. Particularly in proprietary software.
>
> So we'd probably have to retain the old implementation and add a symbol
> version. Not great, but manageable.
Are still compatibility issues besides the sort callback being called
with same pointer (which should be fixed yet)?
I tried to model the introsort based on libstdc++ std::sort, so it is
kinda surprising that we are seeing extra compatibilities issues.
In any case, this still a latent issues that was not triggered before
because the quicksort was not taken in most cases. On a memory pressure
situation, the quicksort will be used and we caller will face the same
compatibility issues.
>
> (b) Performance of the new information
>
> I did not know this previously, but mergesort is rather close to the
> theoretical minimum in terms of comparisons. Quicksort uses something
> 39% more comparisons on average.
>
> I believe minimizing the number of comparisons is a reasonable goal
> given how qsort is typically used: the indirect call has a certain
> overhead, and sorting variable-length strings is not uncommon, either.
>
> As an example, I looked at the stringtable sorting in ldconfig. On my
> system, there are 4,049 strings to sort for tail merging. Because it's
> targeting tail merging, this sort does not use strcmp, but a reverse
> string comparison. (It would probably be faster to reverse the strings,
> sort, and reverse the strings again.)
>
> The number of comparisons and the median run-times (for the fixed input
> table) are
>
> master: 57,384 0.657ms
> master with qsort from 2.38: 43,426 0.543ms
> master heapsort only: 84,507 0.921ms
> + Kuan-Wei Chiu patch: 49,917 0.858ms
>
> The numbers fluctuate quite a bit, I assume the branch predictor is
> struggling. They contradict somewhat that minimizing comparisons is
> important.
>
> I can try to gather more performance numbers, but I don't think this
> looks good. It's slower, and there are far-ranging compatibility
> issues. So I don't think this is the right direction. If quicksort was
> actually faster than mergesort, it might be a different discussion.
Yes, the quicksort performance is usually worse than mergesort and I did
advertise it on cover letter. The change was more motivated to remove
the malloc usage, so the function is fully AS-signal-safe and to simplify
corner case where memory pressure might trigger quicksort and its O(n^2)
worse case.
So we can add back a compatibility symbol or even just revert this change,
but it does not fully solve the corner case issue on what to do if malloc
fails. Should we abort, like gcc_sort does on gcc? Should we fallback to
heapsort (it might the best option in this case)?
POSIX does not really define qsort to be sync-signal-safe, although all
other implementations I am aware of makes it (usually by using quicksort
with O(1) space requirement).
next prev parent reply other threads:[~2023-12-04 14:32 UTC|newest]
Thread overview: 4+ messages / expand[flat|nested] mbox.gz Atom feed top
2023-12-04 13:08 Florian Weimer
2023-12-04 14:32 ` Adhemerval Zanella Netto [this message]
2023-12-05 10:40 ` Florian Weimer
2023-12-05 11:43 ` Sam James
Reply instructions:
You may reply publicly to this message via plain-text email
using any one of the following methods:
* Save the following mbox file, import it into your mail client,
and reply-to-all from there: mbox
Avoid top-posting and favor interleaved quoting:
https://en.wikipedia.org/wiki/Posting_style#Interleaved_style
* Reply using the --to, --cc, and --in-reply-to
switches of git-send-email(1):
git send-email \
--in-reply-to=badb4868-9359-4090-9893-3a6564a7b953@linaro.org \
--to=adhemerval.zanella@linaro.org \
--cc=fweimer@redhat.com \
--cc=libc-alpha@sourceware.org \
/path/to/YOUR_REPLY
https://kernel.org/pub/software/scm/git/docs/git-send-email.html
* If your mail client supports setting the In-Reply-To header
via mailto: links, try the mailto: link
Be sure your reply has a Subject: header at the top and a blank line
before the message body.
This is a public inbox, see mirroring instructions
for how to clone and mirror all data and code used for this inbox;
as well as URLs for read-only IMAP folder(s) and NNTP newsgroup(s).